Spell checker - how does it work?
Hi, I'm basically trying to implement "spell checker" type functionality, in a program who's purpose is to automatically fix small errors in the source text.
(The source text is coming from a character recognition input, which works great except for the odd letter, so this "spell checking" functionality is meant as a final pass to smooth over those little bumps).
Has anyone done this before?
I've discovered Suffix Trees
http://en.wikipedia.org/wiki/Suffix_tree
as a way of storing a dictionary of words for fast matching,
and also Levenshtein distance, which can determine which word is a closer match
http://en.wikipedia.org/wiki/Levenshtein_distance
but I haven't yet seen anything close to bringing it all together to make a spell checker kind of thing. I would have thought that kind of info would be more available by now?
Or you can simply use a spell checker that already exists, or just read the source code if you want to learn how things work:
Look into GNU Aspell.
Look into GNU Aspell.
http://norvig.com/spell-correct.html has a good explanation of one approach to this, along with some code.
The way it's set up you should be able to adjust it to correct for just the types of errors the OCR produces. For example I'd think an OCR is unlikely to transpose two letters like a human often does.
The way it's set up you should be able to adjust it to correct for just the types of errors the OCR produces. For example I'd think an OCR is unlikely to transpose two letters like a human often does.
Thanks guys :)
d000hg, that's confirming a lot of what I'm finding in my research.
Thanks for the link to GNU Spell fpsgamer, I was thinking something like this should already exist out there :)
Adam_42 that explanation page looks great! I think I will get a lot from that.
d000hg, that's confirming a lot of what I'm finding in my research.
Thanks for the link to GNU Spell fpsgamer, I was thinking something like this should already exist out there :)
Adam_42 that explanation page looks great! I think I will get a lot from that.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement