<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-gb">
	<link rel="self" type="application/atom+xml" href="https://forum.eggheads.org/app.php/feed/topic/12049" />

	<title>egghelp/eggheads community</title>
	<subtitle>Discussion of eggdrop bots, shell accounts and tcl scripts.</subtitle>
	<link href="https://forum.eggheads.org/index.php" />
	<updated>2006-06-22T07:56:25-04:00</updated>

	<author><name><![CDATA[egghelp/eggheads community]]></name></author>
	<id>https://forum.eggheads.org/app.php/feed/topic/12049</id>

		<entry>
		<author><name><![CDATA[cerberus_gr]]></name></author>
		<updated>2006-06-22T07:56:25-04:00</updated>

		<published>2006-06-22T07:56:25-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=64262#p64262</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=64262#p64262"/>
		<title type="html"><![CDATA[Help about a link]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=64262#p64262"><![CDATA[
Most of times <a href="http://www.domain" class="postlink">www.domain</a> is the same with domain, a the www is the default subdomain.<br><br>You are correct about 2, I didn't think like this.<br><br><br>I 'll describe you what exactly I want to do:<br><br>I want to create a package which extracts data from webpages. I'm going to give it a initial webpage and the script is going to follow every page and check for data inside. I'll have a list with all links that script found, and i'm going to visit every one.<br><br>My problem is that a lot of pages have links in different format. It could be a page which has 2 same links ("<a href="http://domain/hello.htm" class="postlink">http://domain/hello.htm</a>" and "/hello.htm") and I want my code to be clever to understand that these links are the same.<br><br>That's why I want to add links to the list with format "http://(subdomain.)domain/file.htm" in order to could check if a link already exists to the list and don't loose time to parse it again.<br><br>So, I need a procedure which is going to return a link in this format (like a web browser does with links)<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2661">cerberus_gr</a> — Thu Jun 22, 2006 7:56 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[user]]></name></author>
		<updated>2006-06-22T07:19:04-04:00</updated>

		<published>2006-06-22T07:19:04-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=64260#p64260</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=64260#p64260"/>
		<title type="html"><![CDATA[Help about a link]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=64260#p64260"><![CDATA[
Your request is weird.<br><br>1) "<a href="http://www.domain" class="postlink">www.domain</a>" != "domain"<br>2) links not starting with a protocol are relative, so the absolute version of "<a href="http://www.domain" class="postlink">www.domain</a>" would be "<a href="http://base.href/www.domain" class="postlink">http://base.href/www.domain</a>"<br>3) your last example doesn't make any sense to me at all<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2878">user</a> — Thu Jun 22, 2006 7:19 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[cerberus_gr]]></name></author>
		<updated>2006-06-22T07:46:29-04:00</updated>

		<published>2006-06-21T10:30:56-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=64248#p64248</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=64248#p64248"/>
		<title type="html"><![CDATA[Help about a link]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=64248#p64248"><![CDATA[
Let's try again <img class="smilies" src="https://forum.eggheads.org/images/smilies/icon_smile.gif" width="15" height="15" alt=":)" title="Smile"><br><br>I have a webpage in html format with 100 links inside. The links don't have the same format . The formats of the links for the file file.htm are:<br><br>1) &lt;a href="<a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a>"&gt;<br>2) &lt;a href="<a href="http://www.domain/folder/file.htm" class="postlink">www.domain/folder/file.htm</a>"&gt;<br>3) &lt;a href="<a href="http://domain/folder/file.htm" class="postlink">http://domain/folder/file.htm</a>"&gt;<br>4) &lt;a href="/folder/file.htm"&gt;<br>5) &lt;a href="file.htm"&gt; (relative) <br><br><br>I have written a code which extracts all the links from the webpage and adds them to a list. So, I have a list like the following:<br><div class="codebox"><p>Code: </p><pre><code>(bin) 49 % echo $links{http://www.domain/folder/file.htm www.domain/folder/file.htm http://domain/folder/file.htm /folder/file.htm file.htm}</code></pre></div><br>Now, I want to create a procedure which takes each one of the links and returns it on the format:<br><br><a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a> or<br><a href="http://domain/folder/file.htm" class="postlink">http://domain/folder/file.htm</a><br><br>Example:<div class="codebox"><p>Code: </p><pre><code>proc format_url { link_found parent_link } {}</code></pre></div><br>(bin) 50 % set a [format_url "<a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a>" "<a href="http://www.domain/lala" class="postlink">http://www.domain/lala</a>"]<br><a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a><br><br>(bin) 51 % set a [format_url "<a href="http://www.domain/folder/file.htm" class="postlink">www.domain/folder/file.htm</a>" "<a href="http://www.domain/lala" class="postlink">http://www.domain/lala</a>"]<br><a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a><br><br>(bin) 52 % set a [format_url "<a href="http://domain/folder/file.htm" class="postlink">http://domain/folder/file.htm</a>" "<a href="http://www.domain/lala" class="postlink">http://www.domain/lala</a>"]<br><a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a><br><br>(bin) 53 % set a [format_url "/folder/file.htm" "<a href="http://www.domain/lala" class="postlink">http://www.domain/lala</a>"]<br><a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a><br><br>(bin) 54 % set a [format_url "file.htm" "<a href="http://www.domain/lala" class="postlink">http://www.domain/lala</a>"]<br><a href="http://www.domain/%5Bb%5Dlala" class="postlink">http://www.domain/[b]lala</a>[/b]/file.htm<br><br><br>I'm not so good with regural expressions, so i need some help with this.<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2661">cerberus_gr</a> — Wed Jun 21, 2006 10:30 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[SaPrOuZy]]></name></author>
		<updated>2006-06-21T08:55:57-04:00</updated>

		<published>2006-06-21T08:55:57-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=64244#p64244</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=64244#p64244"/>
		<title type="html"><![CDATA[Help about a link]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=64244#p64244"><![CDATA[
try to be clearer...<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=4727">SaPrOuZy</a> — Wed Jun 21, 2006 8:55 am</p><hr />
]]></content>
	</entry>
		<entry>
		<author><name><![CDATA[cerberus_gr]]></name></author>
		<updated>2006-06-20T21:50:55-04:00</updated>

		<published>2006-06-20T21:50:55-04:00</published>
		<id>https://forum.eggheads.org/viewtopic.php?p=64233#p64233</id>
		<link href="https://forum.eggheads.org/viewtopic.php?p=64233#p64233"/>
		<title type="html"><![CDATA[Help about a link]]></title>

		
		<content type="html" xml:base="https://forum.eggheads.org/viewtopic.php?p=64233#p64233"><![CDATA[
Hello,<br><br>I have a code which gets all the links from a webpage. The formats could be:<br><br>1) <a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a><br>2) <a href="http://www.domain/folder/file.htm" class="postlink">www.domain/folder/file.htm</a><br>3) <a href="http://domain/folder/file.htm" class="postlink">http://domain/folder/file.htm</a><br>4) /folder/file.htm<br>5) file.hmt (relative)<br><br>I want to create a procedure which takes as parameters the link and the link from the html which parsed and returns the link in the format: <br><br>1) <a href="http://domain/folder/file.htm" class="postlink">http://domain/folder/file.htm</a> or<br>2) <a href="http://www.domain/folder/file.htm" class="postlink">http://www.domain/folder/file.htm</a><br><br><br>Example:<div class="codebox"><p>Code: </p><pre><code>proc format_url { link parent } {   ...}</code></pre></div><br>Thanks<p>Statistics: Posted by <a href="https://forum.eggheads.org/memberlist.php?mode=viewprofile&amp;u=2661">cerberus_gr</a> — Tue Jun 20, 2006 9:50 pm</p><hr />
]]></content>
	</entry>
	</feed>
