Sign in to follow this  
bronxbomber92

Word Lists

Recommended Posts

Hi, I can't seem to find a decent word list. I'm looking for a word list that will account for different tenses and plurals of the words also (ex - punch, punched, puncher, punchers, ect..); A word list that this game might use: http://www.lumosity.com/brain-games/flexibility-games/word-bubbles. Do you guys having anything?

Share this post


Link to post
Share on other sites
I found these while searching for the canonical "dictionary.txt" file
http://wordlist.sourceforge.net/
http://www.outpost9.com/files/WordLists.html

Early search engines used to use lemmatization and stemming algorithms to generate English word variants automatically.

Share this post


Link to post
Share on other sites
Thanks guys! I *think* I found a decent word list, and the porter stemming 2 algorithm seems perfect(-enough)!
Now the next tasks becomes efficiently loading the word list at startup. Do you guys think writing the dictionary to a binary file offline, then loading that file at runtime would be the way to go?

Share this post


Link to post
Share on other sites
How big is the list? I can't imagine that reading it in the naïve way would be all that slow. Have you tested it? If it takes < 1 second to load, there's probably no point optimizing it...

Share this post


Link to post
Share on other sites
The txt file is 2.2 mb, and I've been able to transform it into a 1.5 mb binary file, which still freezes my computer for a while parsing it. This word list is huge.

What I'm thinking about it separating the word list into a file for each letter, then concurrently loading the files, and merging the results back into a list.

Share this post


Link to post
Share on other sites
Quote:
Original post by bronxbomber92
The txt file is 2.2 mb, and I've been able to transform it into a 1.5 mb binary file, which still freezes my computer for a while parsing it. This word list is huge.

What I'm thinking about it separating the word list into a file for each letter, then concurrently loading the files, and merging the results back into a list.


What exactly is the bottleneck? Disk IO? Tree construction? Memory allocations?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this