Archived

This topic is now archived and is closed to further replies.

clum

How to get rid of the lounge from Active Topics!

Recommended Posts

How would I make a filter in privoxy to remove lounge entries from Active Topics? It seems to use Perl regular expressions which I am not too familiar with. I tried various things, but I either got nothing to happen, or got most of the content of the web page to disappear. What should I add to my .filter file? The official zorx website [edited by - clum on November 24, 2003 10:36:26 PM]

Share this post


Link to post
Share on other sites
  <tr>
<td BGCOLOR="#2A2C3D">

<a HREF="topic.asp?topic_id=191258"><img SRC=''new_t.gif'' alt=''New Topic'' border=0></a></td>

</td>
<td WIDTH="50%" CLASS=forumcell>
<a HREF="topic.asp?topic_id=191258">Open ''AL'' ...co..co...coo...compile?</a>

</td>
<td CLASS="altforumcell" NOWRAP><a HREF="/profile/profile.asp?id=51435" TARGET="_new">
Sean_Berry</span></a></td>
<td WIDTH="15%" NOWRAP ALIGN=center CLASS=ForumCell><span CLASS=smallfont>
November 15, 2003 <span CLASS=time>7:16:24 PM</span>
</span></td>
<td WIDTH="2%" ALIGN=center CLASS=AltForumCell>3</td>

<td WIDTH="3%" ALIGN=center CLASS=ForumCell>89</td>
<td WIDTH="30%" BGCOLOR="#2A2C3D"><span CLASS=smallfont><a href="forum.asp?forum_id=27">NeHe Productions</a></span></td>
</tr>


that''s a row from active topics, for reference.

Maybe:

<tr>.*(GDnet Lounge).*</tr>

Share this post


Link to post
Share on other sites
I tried something like that, but it kept removing the whole website (I guess it removed everything from the first <tr> to the last </tr> because it found Lounge in the middle). I don''t know how to limit it to just one entry. I also tried something like s|<tr>.{100}(\n.{100}){17}.{100}Lounge.{100}\n.{100}</td>|<!-- Remove lounge -->|g, but that doesn''t do much. What options can I give for the last thing besides g?

The official zorx website

Share this post


Link to post
Share on other sites
Someone really nice from the privoxy mailing list solved it for me! Here''s his message:

Shalom Naumann wrote:
>
> Thank you, I didn''t notice that typo. The problem I have now is that it
> removes the entire web site except for the very top and bottom! I''m assuming
> that this is because it contains a <tr> near the top and a </tr> near the
> end, and it contains the words GDNet Lounge in the middle. How do I get it
> to look for only the innermost <tr></tr> tags?

Hmmm. The problem you''re having is a classic issue with using regexps
to modify things like HTML that have a markup structure. You''ve
specified the ''U'' flag to make everything ungreedy, which you would
think would limit it to only match an individual table row. The problem
is that you have no control over -which- <tr> it anchors to. For
example, it will match:

<tr>
...stuff...GDNet Lounge...stuff...
</tr>

Which is what you want. But it will also match (and hence remove) the
following block:

<tr>
...stuff...Graphics Programming and Theory...stuff...
</tr>

<tr>
...stuff...GDNet Lounge...stuff...
</tr>

In other words, the <tr> in your match can be any <tr>, even the one of
a previous non-matching row, since it scans from beginning to end. The
"ungreedyness" only applies to the stuff between "<tr>" and "GDNet
Lounge". It does not have the power to tell the regex engine to stop
even though it has a match on the first "<tr>" because there''s another
"<tr>" coming up later that also matches but with less included. What
you need is a way to specify that the <tr> must be in the same logical
block as the "GDNet Lounge".

After playing with it a little I think this will do the trick:

s@<tr([^\n]+\n){5,20}[^\n]+GDNet Lounge.*</tr>@<!-- Lounge entry-->@gisU

This takes advantage of the structure of the file, and limits the amount
of matching stuff between "<tr>" and "GDNet Lounge" to be at least 5
lines and no more than 20 lines. I chose 20 because in the html source
each row contains approximately 20 newlines, so this should prevent it
from matching across several rows.

HTH,
Brian

The official zorx website

Share this post


Link to post
Share on other sites
BTW, for people that want to know how to do this in short:

Find your privoxy configuration files (in Unix, probably /usr/local/etc/privoxy.

In default.filter, add:
FILTER: lounge Remove lounge entries from Active-Topics at gamedev.net
s@<tr([^\n]+\n){5,20}[^\n]+GDNet Lounge.*</tr>@<!-- Lounge entry-->@gisU

In default.action, add the following line somewhere:
activetopics = +filter{lounge}

and the following lines somewhere:
{activetopics}
.gamedev.net/community/forums/activetopics*

It works for me! Now I never have to look at the lounge again (unless I want to).

The official zorx website

Share this post


Link to post
Share on other sites