<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-gb">
	<link rel="self" type="application/atom+xml" href="https://forum.eggheads.org/app.php/feed/topic/11799" />

	<title>egghelp/eggheads community</title>
	<subtitle>Discussion of eggdrop bots, shell accounts and tcl scripts.</subtitle>
	<link href="https://forum.eggheads.org/index.php" />
	<updated>2006-05-10T20:26:36-04:00</updated>

	<author><name><![CDATA[egghelp/eggheads community]]></name></author>
	<id>https://forum.eggheads.org/app.php/feed/topic/11799</id>

		<entry>
		<author><name><![CDATA[De Kus]]></name></author>
		<updated>2006-05-10T20:26:36-04:00</updated>

		<published>2006-05-10T20:26:36-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62912#p62912</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62912#p62912"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62912#p62912"><![CDATA[
As more as I understand your question, as less I understand what you actually ask <img class="smilies" src="https://forum.eggheads.org/images/smilies/icon_biggrin.gif" width="15" height="15" alt=":D" title="Very Happy">.<br><br>So you want to know if someone used XPath by tDOM in a script and managed to use a more flexible statement than your example in the FAQ?<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2382">De Kus</a> — Wed May 10, 2006 8:26 pm</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[demond]]></name></author>
		<updated>2006-05-10T12:13:46-04:00</updated>

		<published>2006-05-10T12:13:46-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62882#p62882</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62882#p62882"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62882#p62882"><![CDATA[
dude, do you actually <em class="text-italics">read</em> my posts???<br><br>I am NOT asking you - or anyone else for that matter - to show parsing webpages via regexps or otherwise or/and to elaborate on that; I am simply advocating XPath - which you obviously don't know and haven't used - as a superior tool for the job<br><br>I was asking if anyone knows a script using XPath - and I already gathered you don't - so let put this to rest and move on<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=5056">demond</a> — Wed May 10, 2006 12:13 pm</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[De Kus]]></name></author>
		<updated>2006-05-10T05:25:48-04:00</updated>

		<published>2006-05-10T05:25:48-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62877#p62877</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62877#p62877"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62877#p62877"><![CDATA[
so you are looking for something like<div class="codebox"><p>Code: </p><pre><code>set goal 5set id 0set num 0while {$num &lt; $goal} {  if {[set id [string first $body "&lt;foo " $id]] == -1} {    return -1  } else {    if {[string first $body "&gt;" $id] &gt; [set t [string first $body "bar='id'" $id]] &amp;&amp; $t != -1} {      set id $t      incr num    } elseif {$t == -1} {      return -1    } else {      incr id [string length "&lt;foo "]    }  }}...</code></pre></div>continue with each condition... to find the end of "stuff" continue to find the index (you can also count the &lt;foo opens to know which &lt;/foo&gt; belongs to the wanted open tag) and string range the stuff between the &gt; &lt;. Though I REALLY doubt this is any easier than regexp (however I am sure it would be faster).<br>However you will now run into trouble with case sensitivity and will have trouble to equal bar=id, bar='id' and bar="id". you could of course temporarily convert all " to ' and check for ' (refering to W3C bar=id is wrong syntax anyway).<br><br>or do you want to split the XML tree into a multimentional array? but then I wonder how to *match* paremeters. Dont know if an endless sublist with tag and data would be possible. Maybe parents would just contain a list of direct childs as "data". And still then searching would be difficult in this non-linear list tree.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2382">De Kus</a> — Wed May 10, 2006 5:25 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[demond]]></name></author>
		<updated>2006-05-10T00:04:30-04:00</updated>

		<published>2006-05-10T00:04:30-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62861#p62861</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62861#p62861"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62861#p62861"><![CDATA[
what you seem to be unable to comprehend is that any regexp emulation of XPath's predicates would be ridiculously complicated and hard to read/understand<br><br>it's like doing numerical analysis in Roman numbers - if you know what I mean<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=5056">demond</a> — Wed May 10, 2006 12:04 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[De Kus]]></name></author>
		<updated>2006-05-09T12:52:37-04:00</updated>

		<published>2006-05-09T12:52:37-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62847#p62847</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62847#p62847"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62847#p62847"><![CDATA[
Well you are asking for XPath without using XPath. XPath has been developed for almost 10 years now (refering to the given links by you). Do you believe you can write some fast TCL script to emulate it? I am offering alternatives how to archieve similar results without developping a module worth years of time.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2382">De Kus</a> — Tue May 09, 2006 12:52 pm</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[demond]]></name></author>
		<updated>2006-05-09T11:59:59-04:00</updated>

		<published>2006-05-09T11:59:59-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62840#p62840</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62840#p62840"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62840#p62840"><![CDATA[
either you are a regexp fanatic, or you don't get my point since you don't know XPath<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=5056">demond</a> — Tue May 09, 2006 11:59 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[De Kus]]></name></author>
		<updated>2006-05-09T11:07:11-04:00</updated>

		<published>2006-05-09T11:07:11-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62839#p62839</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62839#p62839"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62839#p62839"><![CDATA[
<blockquote class="uncited"><div>no no, you misunderstood that; perhaps my example was bad<br>basically, if you locate the info you need using XPath positional predicates like for example <em class="text-italics">//foo[@bar='moo'][5]</em>, your script will continue to work even if they add tons of stuff under nodes #1 to #4; you can't do that with regexps - there is no notion of expression position </div></blockquote>So you want the regexp to "intellgently" skip unintresting first 4 &lt;foo bar='moo'&gt; and beging to really parse whole regexp from there on? Depending on the complexy of the expressions you could use my suggestion of string first and give either the index of the 4th match to regexp or string range it together by using string first to find a logical end. However this will never be exact the same as XPath.<br>However you could create it as module, but then again people which are unable to compile the bot might not be able to use it.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2382">De Kus</a> — Tue May 09, 2006 11:07 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[Kappa007]]></name></author>
		<updated>2006-05-09T07:26:30-04:00</updated>

		<published>2006-05-09T07:26:30-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62832#p62832</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62832#p62832"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62832#p62832"><![CDATA[
Just an idea but maybe using tclperl with XML::XPath works?<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=6524">Kappa007</a> — Tue May 09, 2006 7:26 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[demond]]></name></author>
		<updated>2006-05-09T03:00:10-04:00</updated>

		<published>2006-05-09T03:00:10-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62827#p62827</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62827#p62827"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62827#p62827"><![CDATA[
no no, you misunderstood that; perhaps my example was bad<br><br>basically, if you locate the info you need using XPath positional predicates like for example <em class="text-italics">//foo[@bar='moo'][5]</em>, your script will continue to work even if they add tons of stuff under nodes #1 to #4; you can't do that with regexps - there is no notion of expression position <br><br>in general, XPath is vastly superior to using regexps for parsing webpages; the problem is, users need to install some XML parser extension for it to work, and most eggdrop users are too lame to do that<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=5056">demond</a> — Tue May 09, 2006 3:00 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[De Kus]]></name></author>
		<updated>2006-05-06T07:15:34-04:00</updated>

		<published>2006-05-06T07:15:34-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62694#p62694</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62694#p62694"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62694#p62694"><![CDATA[
well doesnt the, in your link from yourself mentioned, tDOM support that flexibilty by using something like this?<br>set node [$root selectNodes {//tr[@id=foo]/td[@id=bar]}]<br><br>though I am still confident about the regexp <img class="smilies" src="https://forum.eggheads.org/images/smilies/icon_razz.gif" width="15" height="15" alt=":P" title="Razz">.<br>Give us an example where<div class="codebox"><p>Code: </p><pre><code>regexp {(?i)&lt;tr .*?id=foo.*?&gt;.*?&lt;td .*?id=bar.*?&gt;(?:\n\s*|)(.+?)(?:\n\s*|)&lt;/td&gt;} $body {} stuff</code></pre></div>doesnt find the wanted piece from your example (though I don't want to talk about the speed of such an expression on a 50kb html file. however I successfully used string first and string range to limit the actual string regexp parses). The given example should still work, even if all the \n and \t are truncated or instead of \t spaces are used. (?:\n\s*|) should match the as long as possible (or nothing), and therebefore "eat" the input. Alternately (probably faster way) would be use string trim $stuff " \t\n" on stuff <img class="smilies" src="https://forum.eggheads.org/images/smilies/icon_biggrin.gif" width="15" height="15" alt=":D" title="Very Happy">.<br><br>You could even go so far to "regsub -all {&lt;!--.*?--&gt;} {} $stuff stuff" to remove any comments (or on body, to remove them before looking for matchs) <img class="smilies" src="https://forum.eggheads.org/images/smilies/icon_biggrin.gif" width="15" height="15" alt=":D" title="Very Happy">.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2382">De Kus</a> — Sat May 06, 2006 7:15 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[Sir_Fz]]></name></author>
		<updated>2006-05-06T06:32:15-04:00</updated>

		<published>2006-05-06T06:32:15-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62689#p62689</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62689#p62689"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62689#p62689"><![CDATA[
I don't think there is any script (atleast public) which features flexible HTML parsing but it would be nice to see it implemented in furture scripts, also would be easier for the scripter since he won't have to keep following the website changes.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=3085">Sir_Fz</a> — Sat May 06, 2006 6:32 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[demond]]></name></author>
		<updated>2006-05-06T06:28:03-04:00</updated>

		<published>2006-05-06T06:28:03-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62688#p62688</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62688#p62688"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62688#p62688"><![CDATA[
we aren't on a PHP forum, so:<br><br>nope, I meant eggdrop scripts that fetch info from webpages<br><br>and no, you can't compensate for webpage changes with regexps alone; it's nowhere near XPath ability to do that<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=5056">demond</a> — Sat May 06, 2006 6:28 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[De Kus]]></name></author>
		<updated>2006-05-06T06:04:05-04:00</updated>

		<published>2006-05-06T06:04:05-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62686#p62686</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62686#p62686"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62686#p62686"><![CDATA[
are you talking about web script as in PHP? There are tDOM and XML (or was tDOM the XML module?!) modules for PHP, but no idea how to use them. Maybe there manuals give you some hints, if they support your intented way of manipulation.<br><br>PS: the advantage of a regular expression to a scan(f) expression, is the flexibility. Using \t+ or \s+ instead of a specific number of chars, should be possible give a certain flexibility.<br>I mean something like "&lt;td .*?id=bar.*?&gt;\n\t+(.+?)\n\t*&lt;/td&gt;" should be flexible in the way you just showed.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2382">De Kus</a> — Sat May 06, 2006 6:04 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[demond]]></name></author>
		<updated>2006-05-06T02:01:57-04:00</updated>

		<published>2006-05-06T02:01:57-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=62682#p62682</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=62682#p62682"/>
		<title type="html"><![CDATA[wondering]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=62682#p62682"><![CDATA[
just out of curiousity:<br><br>does anyone know of a webscript which features flexible HTML parsing, utilizing <a href="http://forum.egghelp.org/viewtopic.php?t=9972" class="postlink">XPath</a> or similar technique? <br><br>e.g. as soon as the following page code:<div class="codebox"><p>Code: </p><pre><code>...&lt;tr id=foo ...   &lt;td id=bar ...      stuff   ...   &lt;/td&gt;&lt;/tr&gt;</code></pre></div>is changed to:<div class="codebox"><p>Code: </p><pre><code>...&lt;tr id=foo ...   &lt;td id=moo ...      &lt;table ...         &lt;td id=bar ...            stuff ...</code></pre></div>you are screwed if you use a script that gets to "stuff" using regexp/regsub to locate the &lt;td&gt; tag with id=bar (which is pretty much every script known to me)<br><br>naturally, XPath is not a panacea against web page changes, but I'd imagine it could provide a far greater degree of flexibility<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=5056">demond</a> — Sat May 06, 2006 2:01 am</p><hr />
]]></content>
	</entry>
	</feed>
