Sign in to follow this  
Nice Coder

Writing a spellchecker

Recommended Posts

Nice Coder    366
Currently, I have my Chattbot (as seen in the AI forum). it has: 1. Tags (scriptlets embedded in responses which can perform calculations, acess input, ect.) 2. Question/response + Learning (like jabberwacky) 3. keyword output And 4. Semantic network But the thing is, all of these require accurate spelling in order to work... (it is very hard to debug something which requires accurate spelling, if you can't spell very well!) what I need is something which can: 1. correct most/all spelling errors (grammar is not important at this stage) and 2. do #1 without needing any user input (IE. no questions like "Did you spell that correctly?") What i am thinking of doing is: 1. translating common abbreviations and 1337 speak into proper English 2. change each "bad" two letter pair (pheomome?, things like ch, la, oo, ee, ect.), into a "good" two letter pair. 3. look up the now changed words one by one in a dictionary of words 4a if the word is "close enough" then change the input word to the new word 4b if the word is not "close enough" then make a new word. For an example: original: r u a bot stage 1: are you a bop stage 2: aree youe ae bope stage 3: aree youe ae bote stage 4: bope changed to bete eg: hou du u spel book stage 1:hoe du you spell book stage 2:hoee due youe spell book stage 3:howe doe youe spell book stage 4:hou changed to how, du changed to do. Would this method be particularly effective? Would it be worth the effort? From, Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
what I need is something which can:
1. correct most/all spelling errors (grammar is not important at this stage) and
2. do #1 without needing any user input (IE. no questions like "Did you spell that correctly?")


Hello,

(2) cannot be implemented correctly - it may depends on the sentence context - and you don't want to dig in the vast natural language area, do you? Of course, you can choose a specific word based on the context you are waiting (making it "default" in fact).

About the algorithm, I encourage you to check GNU ispell program.

Quote:
Original post by Nice Coder
What i am thinking of doing is:
1. translating common abbreviations and 1337 speak into proper English

From,
Nice coder


Hey ! y4 VV4nN4 VVr173 s0m3 1337 2 3n6L1s|-| 7r4nsL470r ? :)

Share this post


Link to post
Share on other sites
Nice Coder    366
Emmanual - Thanks for the ispell link, but maybe linking to here would be better? And i do not have a (working) Linux box since i reformatted #2 into a fileserver [sad].

Pollys probably going to turn out to be a faq bot (answering questions, with a little bit of general chat).
So 1337 3|o33k like that isn't going to be most of the problem, is more of a "ur a bot" sort of input which it will have trouble with.

From,
Nice coder

Share this post


Link to post
Share on other sites
Extrarius    1412
There are a few good ways to find 'matches' in a dictionary for misspelled words:
1) Calculate the number of letter manipulations(swapping nearby letters, replacing a letter, adding a letter, removing a letter) that must be done to get to a valid word and picking the word that requires the least changes.
2) Calculate an appromate phonetic version, so two psuedowords that are spelled completely different but could be pronounced the same will end up with the same value.

I can't remeber the name of either algorithm, but they both have one. Also, I think PHP4 implements a function for each of those kinds of comparisons (and actually I think it has several different ones for #2)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this