Jump to content
  • Advertisement
Sign in to follow this  
bronxbomber92

Word Lists

This topic is 3174 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I can't seem to find a decent word list. I'm looking for a word list that will account for different tenses and plurals of the words also (ex - punch, punched, puncher, punchers, ect..); A word list that this game might use: http://www.lumosity.com/brain-games/flexibility-games/word-bubbles. Do you guys having anything?

Share this post


Link to post
Share on other sites
Advertisement
I found these while searching for the canonical "dictionary.txt" file
http://wordlist.sourceforge.net/
http://www.outpost9.com/files/WordLists.html

Early search engines used to use lemmatization and stemming algorithms to generate English word variants automatically.

Share this post


Link to post
Share on other sites
Thanks guys! I *think* I found a decent word list, and the porter stemming 2 algorithm seems perfect(-enough)!
Now the next tasks becomes efficiently loading the word list at startup. Do you guys think writing the dictionary to a binary file offline, then loading that file at runtime would be the way to go?

Share this post


Link to post
Share on other sites
How big is the list? I can't imagine that reading it in the naïve way would be all that slow. Have you tested it? If it takes < 1 second to load, there's probably no point optimizing it...

Share this post


Link to post
Share on other sites
The txt file is 2.2 mb, and I've been able to transform it into a 1.5 mb binary file, which still freezes my computer for a while parsing it. This word list is huge.

What I'm thinking about it separating the word list into a file for each letter, then concurrently loading the files, and merging the results back into a list.

Share this post


Link to post
Share on other sites
Quote:
Original post by bronxbomber92
The txt file is 2.2 mb, and I've been able to transform it into a 1.5 mb binary file, which still freezes my computer for a while parsing it. This word list is huge.

What I'm thinking about it separating the word list into a file for each letter, then concurrently loading the files, and merging the results back into a list.


What exactly is the bottleneck? Disk IO? Tree construction? Memory allocations?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!