Portable c++ for binary file loader

Started by
4 comments, last by B_old 13 years, 5 months ago
Hi,

to load a binary file I often use code like this
stream.read((char*)&struct, sizeof(Struct));stream.read((char*)structs, numStructs * sizeof(Struct));

For the data I use this code works for both x86 and x64. Is it true, that this is a mere coincidence and I should read only read one element of the struct at a time and loop for several structs?
Advertisement
If the file representation is the same as the struct and the struct is POD then it is defined and a simliar example (though not reading from disk) is given in the standard. Now this does not make it portable, one compiler may add padding where another does not (it is implementation specific) or sizeof(int) maybe 4 and 8 for example. So writing from one and reading from the other can be undefined behaviour.

I try to avoid sizeof() issues by using types defined in <cstdint> in such cases. Is this valid?
What about the padding issues? Is the only way to avoid them, to read element for element?
What is recommended in such a scenario?
I started thinking about those things because of this.
Size and padding are different issues. Using fixed sized datatypes does not guarantee uniform padding across platforms (or perhaps even for compiler settings for the same platform). Raw binary I/O is inherently non-portable. The size is not your only problem; you have, for example, endianness and complementness. These issues may also be something you need to handle, depending on which platforms you intend to support.

If you only want to support different operating systems on an x86 architecture, then it may be enough to cover size variations only. If you want to extend to other common architectures, then you may need to consider big vs. little endian architectures. And to take it even further, if you intend to support some exotic platforms, perhaps you need to consider one's vs. two's complement signed integers as well. The number of issues grows with the size your intended target platform set.

Basically, you need to define a set rules that describes how to store your values so that you can read them on all intended platforms; size and endianness being the most important. Then make functions for each platform to load entries from binary sources, and read them one by one. Perhaps not the best solution, but it will handle all the cases covered by the rules you define.
In addition to the very important points raised by Brother Bob, you should check out the FAQs on serialization.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Thanks for the explanation and links!

This topic is closed to new replies.

Advertisement