Sign in to follow this  

Swear filter

This topic is 4514 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ok, i'm writing a replacement for X3 (for my own channells, in other servers, ect.) ok, as one of the extra things i want it to do, it should do swear checking. For eg. "F****" or "F**K" or "F6CK" or "FCK" shuold all trigger the filter. The problem i'm having, is how to set it up? Does anybody have any ideas on how i should do it? From, Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Fruny
Regular expressions are a good place to start. Then you'll have to build substitution tables. Good luck.


Unfortunatly my regex (vbscript 5.0) decided that any message that included the letter K is a swear word. This is from a single entry 'F*K'.

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
fsck is an unix tool. Now, you're annoying users by preventing them from saying useful stuff.

If you don't mind annoying users, a very simple solution is to mark all words as swear words.


!

Thats a little extreame.

From,
Nice coder

Share this post


Link to post
Share on other sites
I suggest you read up on 'fuzzy matching'.

But as ToohrVyk pointed out, the hard part will be the tweaking. Set up your filter too sensitive and you annoy people by having normal language trigger your filter.
However a too restricitve setup will allow people to trick your filter by simple substitutions.

Just an afterthought: you can probably never make this perfect. Imagine someone talking about his new-born kittens, should the word 'pussy' be considered swearing or not? Am I being offensive talking about the prick of a needle? What about guys named Peter?

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
Quote:
Original post by Fruny
Regular expressions are a good place to start. Then you'll have to build substitution tables. Good luck.


Unfortunatly my regex (vbscript 5.0) decided that any message that included the letter K is a swear word. This is from a single entry 'F*K'.

From,
Nice coder


That's because your regular expression is wrong. F*K means, match 0 or more occurances of F, followed by K.

I think F[^K]*K might be what you're looking for.

Quote:
Original post by DaBono
Just an afterthought: you can probably never make this perfect. Imagine someone talking about his new-born kittens, should the word 'pussy' be considered swearing or not? Am I being offensive talking about the prick of a needle? What about guys named Peter?


To be fair though, those aren't really extreme swearwords. A swear filter is really just to catch the worst of it. Even with the best swear filter in the world, there's nothing to stop users coming up with euphemisms.

Share this post


Link to post
Share on other sites
I feel what you're trying to do is going to give you more trouble than results.

I've been in chat rooms with filters and it allways seems pointless. Instead of a good old F**K U i'de see extremely colorful remarks about each other's mothers including some vegetables and farm animals. I'de see people use several ways of avoiding the filters, by including spaces between the letters, dashes, color codes, text format codes(bold, italic, underline, ...), other letters, reversing the swear word, using the so famous l33t sp34k, you name it.

Although you could probably figure out all of these yourself and add them to the filter, the fact his that after checking for all those words, eventually a few innocent words would end up being included in the filter and really annoy everyone.

I wouldn't like to be in your place, good luck though.

Share this post


Link to post
Share on other sites
Quote:
Original post by Montbrun
Remember that you'll not only have to check ON:TEXT: but ON:ACTION as well for those persnickity /me commands.


They are all treated the same as the bot, and are searched the same way.

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by skittleo
IMO, you should really only filter the pure words themselves. You cannot make assumptions on what the user may be saying. For example, did you know that there is a famous clothing company called fcuk? Yes, well if you filtered that out you may have a mob of angry girls at your door.


For some reason I don't think that any of the girls who shop at that store (located in places like Newbury St., Boston) would ever be in a chatroom of Nice Coder's. He doesn't have to worry about pissing them off because they aren't his target audience.

Share this post


Link to post
Share on other sites
How about them "swear filters" that that filter the word "tit"?

Chatter: I Love the Titans
becomes
Chatter: I Love the %^&#ans

(I've actually had this happen in a chat room. Word for word. I actually had to private message the chatter to ask what he said.)

Be careful. Please.

Share this post


Link to post
Share on other sites
Also,

Quote:
From bash.org:
<Abstruse> !kjv numbers 22:21
<Word_of_God> Numbers 22:21 -- And Balaam rose up in the morning, and saddled his ass, and went with the princes of Moab. - (KJV)
*** Word_of_God was kicked from #christian by SageRider (Please dont Swear)


What you could do, however, is instead of acting immediately to censor the speaker, logging the detected infraction to a file to be reviewed my moderators (and preferably sorted by name/address of the speaker). Thus resulting in human review of the facts, possibly followed by a ban if the matter is severe.

Share this post


Link to post
Share on other sites
Quote:
Original post by Dean Harding
Quote:
Original post by Oxyacetylene
I think F[^K]*K might be what you're looking for.


Actually, I think he wants "F\*+K" which is F followed by one or more occurances of '*' followed by K.


That would be fine, but wouldn't handle "FCK" or "FVCK" ...

f.{1,2}[k\*]

should work..
f followed by any 1 or 2 letters, followed by k or *. works against
fck, f***, f**, f**k, fvk, etc, etc etc.


I find profanity filters to be kinda useless.. anyone can look at your sentance and figure out what you actually said.

Share this post


Link to post
Share on other sites
What i was thinking comes in two parts.

First bit does a find & replace.

Things like 0 - o, ect.

Second uses a bayens net to figure out a score.

For eg.

You suck teh c00ck!

replace teh - the

replace 0-o
replace double letters 00-o

Now, check.

You = +0.5
Suck = +10
cock = +10

I then find the heighest scoring words, (suck/ect.), and if they are > the threshold (2 in this eg.), then they get kicked.

From,
Nice coder

[Edited by - Nice Coder on August 3, 2005 4:54:08 AM]

Share this post


Link to post
Share on other sites

This topic is 4514 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this