Sign in to follow this  
Followers 0
mapPhil

Fast Searching/Modifying data from Binary Files

7 posts in this topic

hi all,

Basically I want to find the quickest way in which to search for data quickly, from a binary file.

For example my bin file will contain data such as:
0,Hello World,0
0,Another World,1

Now I would like to search for each new line of a 10,000 (possible more) .bin file. Currently I am doing it this way:

[CODE]
void FetchData(const unsigned int& line,int column, std::string& data)
{
if (file)
{
// reset file pointer to beginning
file.clear();
file.seekg(0, std::ios::beg);
if (fs.good())
{
// search for the desired line and exit loop
for (int i = 0; i < (line+1) && file.eof() == false; i++)
{
std::getline(file,data);
}
}
}
}
[/CODE]


This does work, but are there any more efficient approaches I could take?

I am also required to modify such data, but I am currently storing the .bin file in a vector temporarily and the modify particular elements of the vector
and then overwrite the existing .bin file with the latest data. If there is a more efficient approach to this please let me know?

Also on a side note, how would performance differ between using an .xml instead of a .bin file ?
Thank you for any help offered.
0

Share this post


Link to post
Share on other sites
From what I can tell, you are working with a text file, not a binary file. Just because the extension is '.bin' it does not become a binary file.

I'm also not clear what exactly it is you are asking. If you really want to detect new lines in your text file, then yes, you have to read each line, interpret it and compare it with the data you already know you have. For something as tiny as 10000 (int, string, int) I see no problem with keeping the data in memory.
If the content of the data is critical I would not just overwrite the file like that. Rename the original file, write out the new file, delete the backup file after it has been ensured the new file is written completely to disk.

There are probably lots of ways to do things more efficiently but there is too little information about what you really need and which assumptions can be relaxed.
0

Share this post


Link to post
Share on other sites
Reading an XML file makes the things even worse.

The constraints to the problem are not clear. There are of course possibilities to make things more efficient, but they may introduce some constraints that may or may not be okay here.

Possible things are
* making the data packets equal in size,
* using a kind of TOC ("table of contents"),
* appending data instead of overwriting,
* using prefetched hash tables (or something) if looking up names is required,
* ...
0

Share this post


Link to post
Share on other sites
Apologies for the lack of detail.


[quote name='BitMaster' timestamp='1354269477' post='5005645']
For something as tiny as 10000 (int, string, int) I see no problem with keeping the data in memory.
[/quote]

I am creating a list for a GUI and considering the the amount of data in the list could be exponential, even larger than 10,000 - though it's unlikely it is certainly possible. On top of this I am
my list can handle a varied amount of columns.

All of the data will actually be strings, even in the example in my first post.

I have not used the file libraries much, but the code I wrote is this to create the file:

[CODE]
std::ofstream file;
file.open("data.bin", std::ios::in | std::ios::out | std::ios::binary | std::ios::app);

std::string str = "0,Hello World,0";
std::string::size_type sz = str.size();
if (file)
{
for(int i = 0; i < 10000; i++)
{
file.write(str.c_str(), strlen(str.c_str()));
file << std::endl;
}
}
file.close();
[/CODE]

If this is incorrect please let me know.

Note: I actually won't be creating the binary file myself, only receiving the file and reading the data in FetchData method.

The main problem presented is my list works perfectly fine up until I require the 4000th row then It becomes noticeably slower, this is because I am iterating through the file until
I reach the index required. Edited by mapPhil
0

Share this post


Link to post
Share on other sites
If the structure of the file is out of your responsibility, you want fast "random" read access w/o loading the entire file into memory, then you can do a pre-pass where the file is scanned through, and for each line the file offset of the current line as well as necessary identifiers (e.g. index, name, hash, whatever) are remembered in working memory. Then you can identify a line of interest in memory, and use the corresponding file offset to skip to into the file and read the line. This is a kind of table-of-contents as mentioned earlier, but outside the file itself.
0

Share this post


Link to post
Share on other sites
Thanks haegarr,

that sound's like a great solution to my problem. Do you know of any tutorials/examples of how to do this?
0

Share this post


Link to post
Share on other sites
[quote name='mapPhil' timestamp='1354277654' post='5005666']
I am creating a list for a GUI and considering the the amount of data in the list could be exponential, even larger than 10,000 - though it's unlikely it is certainly possible. On top of this I am
my list can handle a varied amount of columns.
[/quote]
So you have to display a portion of the list? If so, the number of rows you'll display is most likely limited to a small amount. You can use this to your advantage by mapping out the starting and ending position of blocks of lines, then reference this when you want to display the relevant portions.

Keeping 10,000 or even 100,000 strings into memory shouldn't be problem though, unless you've got some insanely long strings. Edited by Mussi
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0