egghelp/eggheads community

SpamScan

2005-07-08T14:12:02-04:00

well, the arrays is Tcl are associative per se, i.e. able to store key/value pairs, not limited to integer type key/index as most compiled languages

here's what I had in mind:

Code:

[demond@whitepine demond]$ cat test.tclproc countwords str {  global count  set idx 0  while {1} {    set buf [string range $str $idx end]    if {[scan $buf %s word] > 0} {      incr idx [expr [string first $word $buf] + [string length $word]]      if {[info exists count($word)]} {incr count($word)} {set count($word) 1}    } {break}  }}[demond@whitepine demond]$ tclsh8.4% source test.tcl% set a "abc def   \t123\n  xyz\tabc\nabc     123\t"abc def         123  xyz   abcabc     123% countwords $a% array get count123 2 abc 3 xyz 1 def 1

Statistics: Posted by demond — Fri Jul 08, 2005 2:12 pm

SpamScan

2005-07-08T04:56:18-04:00

Hehe, okay that wasn't difficult enough, just using normal arrays. Incr count if a similar word is found, else there is a new word.

Sometimes the easiest things can take alot of time to receive from the brain.
No I am not good with scan, I use lindex, split to get away from it.

NOTE: I was trying to develop a way to deteriorate flood bots who use messages with repeating words and text. Based on varying score for words in a string this can be handy.

Statistics: Posted by awyeah — Fri Jul 08, 2005 4:56 am

SpamScan

2005-07-08T04:46:06-04:00

Code:

foreach word $words {   if {[info exists count($word)]} {      incr count($word)   } {      set count($word) 1   }}

on a second thought, you can do that without splitting into a list, just by using [scan] - but I'm going to bed already

Statistics: Posted by demond — Fri Jul 08, 2005 4:46 am

SpamScan

2005-07-08T04:28:03-04:00

First of all:
set string [string trim [split $text]]

Then:
Umm, tcl assoicated arrays. Actually I am unable to get the logic on how to count for repeated words using the array logic in my head currently. Have any rough example or so?

Statistics: Posted by awyeah — Fri Jul 08, 2005 4:28 am

SpamScan

2005-07-08T03:50:58-04:00

if by "word" you mean sequence of non-whitespace characters, you need to split, trim and then count repeating list elements (be careful to avoid redundant string comparisons); hint: use Tcl's associative array

Statistics: Posted by demond — Fri Jul 08, 2005 3:50 am

SpamScan

2005-07-08T03:35:20-04:00

So about that:

Also I was wondering how can I detect repeated words in a string through regexp? Then count how many are present like with the -all switch I can do that. The main thing is to detect repeated words not characters.

Anyone?

Statistics: Posted by awyeah — Fri Jul 08, 2005 3:35 am

SpamScan

2005-07-07T11:02:14-04:00

I suggest adding a check for server advertising and those lame decode messages.

Statistics: Posted by Sir_Fz — Thu Jul 07, 2005 11:02 am

SpamScan

2005-07-07T10:05:28-04:00

or you can use this if you want to find a url inside a string and dont want to use a regexp longer than your arm..

Code:

regexp -nocase {((http|ftp)://[^\s]+)} $text url

Statistics: Posted by greenbear — Thu Jul 07, 2005 10:05 am

SpamScan

2005-07-07T07:37:59-04:00

Here is a good match pattern for regexp to detect all types of urls:

Code:

(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]

Also I was wondering how can I detect repeated words in a string through regexp or even string match? Then count how many are present? For regexp I found this:

Code:

(\w+)\s+\1

I can do that by using 2 foreach loops on the same string, checking one word, matching against others and going for the next one and doing all again and going further. But is there a simpler and easier way?

Statistics: Posted by awyeah — Thu Jul 07, 2005 7:37 am

SpamScan

2005-07-06T11:55:07-04:00

i tested the script untill now i like it, no bugs or weird things happend, butt one question metroid.

Is there a way i can putt in aline where the doesnt react on ??

in my channel ppl can request relay's (based on your forum script)

butt ppl get warned or even kicked when they use !request 01 #chan

is there a way i can make the script ingore ONLY this ??

greetz Dizzle

Statistics: Posted by Dizzle — Wed Jul 06, 2005 11:55 am

SpamScan

2005-07-05T09:37:26-04:00

That proc will detect if an channel name or website is being advertised and that's all it does. Hence the procname

Statistics: Posted by metroid — Tue Jul 05, 2005 9:37 am

SpamScan

2005-07-05T03:19:24-04:00

I see its reacting on these advertise lines

Code:

proc spamscan::advertisement {line} { if {[regexp -- {\x23\S+|[a-zA-Z0-9]+://\S+\.[a-zA-Z0-9]+|www[0-9]*\.\S+\.[a-zA-Z]+} $line]} {  return 1 } return 0}

Does this detect most off the advertisment??

Im going too test it outonmy channel if i find something ill post it here

Statistics: Posted by Dizzle — Tue Jul 05, 2005 3:19 am

SpamScan

2005-07-02T03:18:17-04:00

hmm I was actually more interested in the spam detection method - what your definition for spam is and how it gets applied to channel traffic

I presume it's in the [advertisement] proc

Statistics: Posted by demond — Sat Jul 02, 2005 3:18 am

SpamScan

2005-07-02T02:54:50-04:00

Here are parts of the script, that should make you understand i guess

Code:

# /* Configuration# * In seconds, If the last time they spammed was 20 seconds they will get 15 points added (default settings)variable time      "10"variable time2     "15"variable time3     "20"# * Points we give for spamming/advertisingvariable advert    "40"variable advert2   "50"variable advert3   "60"# * Points we give out for flood/spamvariable flood     "15"variable flood2    "10"variable flood3    "5"# * Points they need to get punished.variable warn      "80"variable kick      "100"variable ban       "120"

Code:

  if {$spamscan::spamscan} {   if {[advertisement $arguments]} {    if {$checktime >= $spamscan::time3} {     incr db($channel,$ident) $spamscan::advert3    } elseif {$checktime >= $spamscan::time2} {     incr db($channel,$ident) $spamscan::advert2    } else {     incr db($channel,$ident) $spamscan::advert    }   }  }

Code:

  if {$db($channel,$ident) >= $ban} {    putlog "$nickname was \0034banned\003 in $channel. \(score: $db($channel,$ident)\)"    set banmask [banmask $hostname]    putquick "MODE $channel +b $banmask"    putquick "KICK $channel $nickname :[string map "%nickname $nickname %channel $channel %id [expr [channel get $channel spamkicked] + 1]" [join $kickmsg]]"    channel set $channel spamkicked "[expr [channel get $channel spamkicked] + 1]"    if {$bantime != "0"} {     timer $bantime [list pushmode $channel -b $banmask]    }  } elseif {$db($channel,$ident) >= $kick} {    putlog "$nickname was \0038kicked\003 in $channel. \(score: $db($channel,$ident)\)"    putquick "KICK $channel $nickname :[string map "%nickname $nickname %channel $channel %id [expr [channel get $channel spamkicked] + 1]" [join $kickmsg]]"    channel set $channel spamkicked "[expr [channel get $channel spamkicked] + 1]"  }

It will just add a score for each line said and decrease the score over time, If the score gets to a certain point the script will do something, like warn or kick that person.

It's purely for text and stuff, not other things

(This is how the spamscan service on Quakenet works)

Statistics: Posted by metroid — Sat Jul 02, 2005 2:54 am

SpamScan

2005-07-01T21:22:14-04:00

care to explain how does it do its job, maybe paste some relevant code?

I'm curious, but can't be bothered to download and examine it hehe

Statistics: Posted by demond — Fri Jul 01, 2005 9:22 pm