String Building on a Large Scale

Started by
8 comments, last by GameDev.net 18 years, 4 months ago
I am currently working on a utility to turn a binary model format into a text-based one. The problem is, the models are extremely detailed, and the engine architecture means I need to build the entire file in memory and then write it into an archive file. Now, at the moment I am using std::stringstream (all the objects being converted have << and >> overloaded for them for streaming, so using a stringstream makes sense), but I was wondering if there was a faster way of handling this. The only string operation that needs to be performed is appending, and writing (no searching, repacing substrings or any fancy stuff like that). So, if anyone knows of a free library that can handle the processing of vast amounts of text data (in the region of 2-3mb) I would like to hear about it, if not I'll stick with std::stringstream :'(. Thanks in advance
Advertisement
Text-based model files can be very slow. Stay with binarys they have many things that helps to reduce performance hits.
-----"Master! Apprentice! Heartborne, 7th Seeker Warrior! Disciple! In me the Wishmaster..." Wishmaster - Nightwish
actually that's not the issue - I am converting binary files to maya ascii ones in bulk and automatically putting them into an archive (just... don't ask)
Is the std::stringstream actually a performance bottleneck that you've measured?

Anyway, some things to try:
Do you really need to keep the entire thing in memory while you build it up, before dumping it out to file? If there's a known-length header that you need to place at the beginning of the file, then you could consider writing out a dud header first, so that you can start writing out the main data, and then overwriting the dud header with the real information afterwards, although whether this would be faster in practice, I don't know.

I don't know if it makes any difference, but use a std::ostringstream, not a std::stringstream, since you're only performing output operations on it.

Check the documentation for a way to make the stringstream reserve a large chunk of memory at the start (should help if the bottleneck is caused by the stringstream having to repeatedly allocate a larger chunk of memory and copy its contents).

If you're feeling really ambitious (and you've tried the easier things, and you've profiled and you're sure that the stringstream is the problem), then write your own string builder class that will allocate fairly large chunks of memory (eg, a couple of kilobytes) and keep them in a linked list. That way you don't have to copy anything when you need to add another chunk. Might be a good idea to take a peek at the stringstream implementation that you're using to make sure it's not doing something similar already though.

John B
The best thing about the internet is the way people with no experience or qualifications can pretend to be completely superior to other people who have no experience or qualifications.
A string stream is not exactly the fastest way to deal with large strings. One simple solution to avoid unnecessary buffer management would be to fill a custom buffer with a chunk of vertices at a time and reserve enough space in advance (i.e. make sure there's 64 kB of free buffer space and then output a few dozen vertices).
Writing a simple stream class for this should fairly easy, especially since you probably only need to output a small set of datatypes.

You really have to profile to be sure but I suspect that encoding/parsing floating point values will consume a lot of cpu time. I suggest googling for optimized converters, or cheating with a fixed point representation.

Still, encoding a single 2-3 meg file (as opposed to loading dozens at game startup) shouldn't be that slow.
Hehe, why not just write the binary files and your conversion program to the archive instead :) Just incase there isn't a platform that can run the executable in 2432 when the restore the files you can write the source for the program to the archive as well.
Keys to success: Ability, ambition and opportunity.
A deque might be a better choice in this case. The biggest problem I see with using std::string is the memory allocation. Theres not really much you can do about anything else. A deque would still need the same amount of memory, of course, but doesn't need to do nearly as much copying.

If you really want to get it going as fast as possible, you have a couple other options. Allocate a huge block of memory right away - use the std::string but reserve 4 mb of space - and then append to it. (I don't know if stringstream can reserve. If not, just use boost::lexical_cast and the strings += op.) Or switch to a streamed model, and write the file out as you go along. The ultimate model would involve two threads. Thread A would read data X; Thread B will process data X while thread A reads data Y; Thread B will process data Y while thread A writes data X and reads data Z. It doesn't do you any good to have seperate read and write threads unless the source and destination are on different discs. They would just stomp on each other.
Quote:Original post by Deyja
A deque might be a better choice in this case. The biggest problem I see with using std::string is the memory allocation. Theres not really much you can do about anything else. A deque would still need the same amount of memory, of course, but doesn't need to do nearly as much copying.
A deque has lots of other overhead. It's got to check when to move on to the next block (often per-character), it can't be used together with many string functions (such as atof), it can't be sent as a single chunk to the low-level IO functions, it's got huge iterators, etc..
Since we're not dealing with any complicated copy-constructors it's probably better to use a vector and reserving a lot of memory ahead of time.

Another possibility would be to use memory mapped files and get it written transparently by the OS, or even reserve a huge block of memory with VirtualAlloc (or mmap) and commit memory as needed.

Still, I kinda doubt that it's worth the effort. Is a stringstream really so inefficient that it can't deal with 3 megs of data gracefully on modern machines?
thanks for the replies all - I will be doing some stress testing this weekend to see what works best.

To the people asking if stringstream is really that inefficiant. It's not for a few models. However, this tool will be responsible for doing batch conversions of nigh-on one thousand models, hence me wanting to put the extra effort into making it go as fast as possible.
No need to worry, batch conversions aren't usually time sensitive. Just start running the thing and leave it running while you go home over the night, if that's what it takes... As long as it's done when you need the converted models, everything will be OK. I suspect that if you start fiddling around with a bunch of optimizations, you will waste much more time than you'll ever save by making the program faster... It's all about priorities in the end!

This topic is closed to new replies.

Advertisement