• Advertisement
Sign in to follow this  

Spell checker - how does it work?

This topic is 3706 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm basically trying to implement "spell checker" type functionality, in a program who's purpose is to automatically fix small errors in the source text. (The source text is coming from a character recognition input, which works great except for the odd letter, so this "spell checking" functionality is meant as a final pass to smooth over those little bumps). Has anyone done this before? I've discovered Suffix Trees http://en.wikipedia.org/wiki/Suffix_tree as a way of storing a dictionary of words for fast matching, and also Levenshtein distance, which can determine which word is a closer match http://en.wikipedia.org/wiki/Levenshtein_distance but I haven't yet seen anything close to bringing it all together to make a spell checker kind of thing. I would have thought that kind of info would be more available by now?

Share this post


Link to post
Share on other sites
Advertisement
Or you can simply use a spell checker that already exists, or just read the source code if you want to learn how things work:

Look into GNU Aspell.

Share this post


Link to post
Share on other sites
http://norvig.com/spell-correct.html has a good explanation of one approach to this, along with some code.

The way it's set up you should be able to adjust it to correct for just the types of errors the OCR produces. For example I'd think an OCR is unlikely to transpose two letters like a human often does.

Share this post


Link to post
Share on other sites
Thanks guys :)

d000hg, that's confirming a lot of what I'm finding in my research.

Thanks for the link to GNU Spell fpsgamer, I was thinking something like this should already exist out there :)

Adam_42 that explanation page looks great! I think I will get a lot from that.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement