Interleaving files

Started by
5 comments, last by Gagyi 16 years, 8 months ago
I want to interleave 9 MB of files (MSN messenger log XML files depending on "time sent"), and I want to do it as fast as I can. How? I guess opening all 50 files and continously reading from them is not too optimal. Copying all the files into RAM and opening them is also not an option, since I dont have much RAM in the target computer (windows XP with 128MB of RAM). So I'm confused... What to do? Some intermediate technique?
-----------------------------------"After you finish the first 90% of a project, you have to finish the other 90%." - Michael Abrashstickman.hu <=my game (please tell me your opinion about it)
Advertisement
How about trying to load them to memory anyway, despite the target pc's small memory ? There's always virtual memory that can be used (and no doubt already are to quite some extent) and loading all the files should be more convenient for you. If the files you load end up in the virtual memory they're read from the harddrive just like if you would do it yourself anyway.
I'd try the simple way first with some dummy test-app, should be quite fast to write one that loads them files and does some reading from them or so.
If this really is a problem, you could use algorithms from the Merge family. Here's a link to Wikipedia to an article about Merge algorithm. You are perhaps interested on the chapters Utility in online sorting and Merge sorting tape drives and then merging those different files with Merge.

There is also some sample code available, but as Pzc made a note already, this approach may be a bit too heavy. Though reading all the data in memory, sorting it and then doing Merge seem to be what you are trying to achieve.
---Sudet ulvovat - karavaani kulkee
Actually, yes. And made that kind of algorithy before.

And I have a 3Ghz computer here with 1 GB RAM, this is my developing computer. But it has to work on low end computers too, because i'm making this for my friends.

Sure, I could write the program, and ttest at my friends', butI dont want to make the whole thing 3 times with different algos because of memory or speed problems. So i thought you could help me a bit with your experience, so i dont have to reinvent the wheel. But if i have to, i will ;)
-----------------------------------"After you finish the first 90% of a project, you have to finish the other 90%." - Michael Abrashstickman.hu <=my game (please tell me your opinion about it)
Well, my advice is the same as the one Plc gave already, well, partially. :-) Windows XP with 128 MiB of memory strikes a bit low to me already, but reading 9 MiB of data into memory shouldn't be a problem since there likely is software that uses such an amount easily and it works. Thus you should let the operating system to worry about the memory issue and swap some data not needed from the memory to the disk so that your 9 MiB and the actual program fits there to do the job.

Doing an on-line Mergesort generates around the same amount of disk I/O than reading everything into the memory at once in this case. In addition, writing an on-line Mergesort is somewhat more involved and it could be that in writing that you end up seeing that the operating system reads all the data into memory in any case (at least all of the data in one file since you probably end-up opening and reading them one by one). Your algorithm will be much more simpler if you just read all the files into memory, sort them and do a Merge on them. You could try reading first two files, sort them, do a merge on them read the next file, sort it, merge with the master list and so forth until the the files are processed, but I don't see any real benefit in that.

If you know something about the data already, like that all "time sent" stamps are already in sorted order, you can skip sorting and run a plain merge instead after you have read the data into memory. There are also some other possible heuristics to speed things up, but in case of 9 MiB, I don't think they are going to be of any use. By the way, if you read the Wikipedia article regarding Merge, you could notice that STL has already implementation of Merge to lists (I assume you write C++, but other languages have similar algorithms available also).

P.S.
If you want to experiment with some I/O whilst doing your algorithm, there's a nice two part article series written by Jesus de Santos Garcia available in Gamasutra called Fast File Loading. Just click the name of the author in the upper left corner to see the rest of the articles.

Just remember that as a general rule the algorithms you make should above all make the correct thing, be easy to understand (depending on the context, advanced algorithms are all right if you or the maintainer understand them) and maintain and as efficient as possible, in that order.
---Sudet ulvovat - karavaani kulkee
I onyl worried about 50 file pointers open at once, but the sequencial merging seems fine, thanks for the help :) (BTW i write it in delphii, because its more FUN)
And, of course, the timestamp value is sorted.

(P.S: I was too surprised that XP runs on 128 MB RAM...)
-----------------------------------"After you finish the first 90% of a project, you have to finish the other 90%." - Michael Abrashstickman.hu <=my game (please tell me your opinion about it)
The sequencial file loading and merging looks fine for me, because i was only worried about having 50 file pinters open at once. And this seem to have the smallest memory footprint.

I wirte in delphi, because it's much more FUN. (and i have to have my own Merge...)

P.S: I was too surprised that XP runs on 128 MB RAM
-----------------------------------"After you finish the first 90% of a project, you have to finish the other 90%." - Michael Abrashstickman.hu <=my game (please tell me your opinion about it)

This topic is closed to new replies.

Advertisement