Jump to content
  • Advertisement
Sign in to follow this  
Jaqen

Working with huge text files...

This topic is 3657 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a text file and I'm using it to make a word game. The thing is, though, the file is 5 mb and 179k lines. What would be the best way to go about working with this? Should I load it all into a vector/array? I figure looping through the whole file each time I'm trying to check for a word would be a lot slower than loading it all into memory first. Will looping through 179k lines go slow? I haven't gotten a chance to try it yet but am just trying to play this out before I tackle it. Here's two lines from the file for an example: PROGRAM to arrange in a plan of proceedings [v -GRAMED, -GRAMING, -GRAMS or -GRAMMED, -GRAMMING, -GRAMS] PROJECT to extend outward [v -ED, -ING, -S] Thanks for any advice. (source for the text file is zyzzyva.com if anyone's curious. scrabble dictionary or something like that)

Share this post


Link to post
Share on other sites
Advertisement
Maybe load words into a seperate vector/map based on the starting letter? So all A's in one, all B's in another. Then when you need to search for a word you can check the starting letter and then search the correct map.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jaqen
I have a text file and I'm using it to make a word game. The thing is, though, the file is 5 mb and 179k lines.


This is not "huge".

Quote:
What would be the best way to go about working with this? Should I load it all into a vector/array? I figure looping through the whole file each time I'm trying to check for a word would be a lot slower than loading it all into memory first.


What does "check for a word" mean for you?

Quote:
Will looping through 179k lines go slow?


It might go much slower than is called for, depending on what you are trying to do. What are you trying to do?

Share this post


Link to post
Share on other sites
Oh, well I just finished writing up code to print out every line in the file and the program took 43.859 seconds. I mean, that seems too long to me so I'm betting someone knows a way that makes more sense, which is what I'm looking for.

The main part of it is you're going to be given an anagram, and then you type in words you can get out of it. It'll check the list to see if it exists.

Share this post


Link to post
Share on other sites
The big question is what platform you are developing for. I'm going to assume that you are working with the PC as your platform, in which case loading everything into memory is not a big deal. You should not use a vector or array if you need to access this information frequently, you should use a data structure called a hash table as it is quite efficient at retrieving information for large data sets.

Share this post


Link to post
Share on other sites
Yeah, I am using PC though it'll probably be cross platform. I know nothing about this area of programming so I'll go look up hash tables and see what I can come up with...

Share this post


Link to post
Share on other sites
Quote:
Original post by Jaqen
Oh, well I just finished writing up code to print out every line in the file and the program took 43.859 seconds. I mean, that seems too long to me so I'm betting someone knows a way that makes more sense, which is what I'm looking for.


The question is, was reading the file the slow part, or was printing out each line the slow part? If this was inside a Windows console (e.g. run from cmd.exe) then I would guess the printing was significantly slower than the reading part. Simply reading a file line by line should take a fraction of a second for a relatively small file like you've described (5MB is not big).

Try taking out the bit where you actually print each line and see what the difference is.

Share this post


Link to post
Share on other sites
When you say it took nearly 50 secs to write the data to screen, to be honest this doesn't seem too outlandish at all if you're dumping it to standard out (console). Writing to console isn't exactly a performance-friendly operation. For example, iterating over and assigning to every value in a std::vector of 180000 integers takes a blink of an eye, whereas printing them to std::cout takes about 20 secs. I know you're not using ints, but the point is don't start optimising until you're sure you need to.

ninja'd, dammit!

Share this post


Link to post
Share on other sites
You guys are right... I commented out the cout line and it took all of .406s to execute. Wondering how much longer it'll take to put it into a vector on top of that... according to you, not much time at all.

Sorry, I guess I got ahead of myself if I really don't need all this stuff.

Share this post


Link to post
Share on other sites
put it in a vector and do a binary search on it. it will take < 19 checks to see if one of the 179K lines has the word you are looking for as long as they are all sorted in alphebetical order.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!