Sign in to follow this  
Domarius

Spell checker - how does it work?

Recommended Posts

Hi, I'm basically trying to implement "spell checker" type functionality, in a program who's purpose is to automatically fix small errors in the source text. (The source text is coming from a character recognition input, which works great except for the odd letter, so this "spell checking" functionality is meant as a final pass to smooth over those little bumps). Has anyone done this before? I've discovered Suffix Trees http://en.wikipedia.org/wiki/Suffix_tree as a way of storing a dictionary of words for fast matching, and also Levenshtein distance, which can determine which word is a closer match http://en.wikipedia.org/wiki/Levenshtein_distance but I haven't yet seen anything close to bringing it all together to make a spell checker kind of thing. I would have thought that kind of info would be more available by now?

Share this post


Link to post
Share on other sites
http://norvig.com/spell-correct.html has a good explanation of one approach to this, along with some code.

The way it's set up you should be able to adjust it to correct for just the types of errors the OCR produces. For example I'd think an OCR is unlikely to transpose two letters like a human often does.

Share this post


Link to post
Share on other sites
Thanks guys :)

d000hg, that's confirming a lot of what I'm finding in my research.

Thanks for the link to GNU Spell fpsgamer, I was thinking something like this should already exist out there :)

Adam_42 that explanation page looks great! I think I will get a lot from that.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this