C++: Cross-platform binary file writing

Started by
10 comments, last by SiCrane 16 years, 2 months ago
To store various data, my program reads/writes a binary file. I use the standard fstream library to do this. Works great, but I like to keep my code cross-platform, and so I'm wondering how to make sure that it is. Currently I have every piece of data casted to chars, and write the chars. I use sizeof() to write the appropriate number of bytes/chars for each item, and similarly uses sizeof() to read them back. My concerns: 1.)If I create my datafiles on a platform where sizeof() returns a different value for certain types (such as int, etc...) than the platform my program is later compiled and run on, it will read the incorrect number of bytes for certain items. Possible solution: Remake the datafiles on each platform I compile on? 2.) Endianess: Obviously this is an issue. If I cast an item: (char*) &myInt, and write it's sizeof(myInt) bytes to the file, when they are later read and cast back to an (int*), the bytes will be backwards on a machine with different endianness. 3.) Are there others I'm missing? I've searched the net but most places simply say: don't do this. But is there a way? I know these casts to bytes seem messy, but it's limited to one place in my code that handles it for everything else, and it works just great for me. (I just haven't tried it on different platforms yet. [wink]) Any thoughts?
Advertisement
This is a perfectly reasonable thing to do. I'd recommend you define some types that you guarantee will be of a certain size regardless of platform; define these in some Platform.h header kind of thing. I.e., int32, uint16, etc. When you store binary files, use these typedefs, and then you'll be guaranteed that sizeof( int32 ) is always 4 regardless of platform.

As far as the endian issues, if you want to re-use the exact same file between platforms, you'll need to store in the file which endian type the file was written in, and then endian-swap when loading if necessary. Alternatively, you could write different binary files out for all your target platforms, and do the endian-swapping before you write out the file; this latter option gives faster run-time loading of course.

The only other issues you might run into are if you're reading/writing entire structures whose alignment/padding might be defined differently on different compilers. Normally this isn't too much of an issue, but you can use things like #pragma pack, or just be explicit about adding your own padding member variables. Other issues that could conceivably come up are when you actually want to write different structures for different platforms, but it doesn't seem like that's within the scope of what you're dealing with.
Keep in mind there are a lot of libraries that already provide platform headers for things like integer widths. Ex: boost::cstdint and SDL both contain integer width typedefs. SDL also has functions for dealing with endianness. However, one thing you'll have to deal with is that there is no reliable way to serialize floating point values as binary between platforms. While IEEE 754 specifies things like bit order and so forth, it doesn't specify byte order, so two platforms with the same endianness and both using IEEE 754 floating point types may have incompatible binary representations. Unfortunately, the only portable way to serialize floats is to do so by converting them into text.

Also rather than rely on packing pragmas to serialize structures, you can just serialize and deserialize the individual members one at a time. You'd probably need to do this anyways for endian issues.
Thanks for all the help!

I'm already serializing the individual members rather than the structures as a whole, so padding won't be an issue.

I've just looked into the boost cstdint.hpp header and it looks like it will work great, so I'm going to go ahead and use that.

Thanks for the heads up on floating point values. I'll try to work around needing to save any.

One more question... So can I assume regular old "unsigned char" types are safe across platforms?
Quote:3.)I know these casts to bytes seem messy, but it's limited to one place in my code that handles it for everything else, and it works just great for me. (I just haven't tried it on different platforms yet.


Memory alignment. Individuals values in binary stream may not be properly aligned for processor.

Unless you assume Intel architecture, you should rely on memcpy to copy data to/from buffer. On others, simply type-casting memory offsets into types you want to read can fail. Note that this is processor-level exception.

Example:
short     int0A 00 | 02 00 00 00 |char * int_ptr = buf[2];int value = *( (int *)int_ptr); // boom, attempting to real improperly aligned int
Quote:Original post by BeauMN
So can I assume regular old "unsigned char" types are safe across platforms?


No. char may not be the same size on all platforms. Some platforms use a 32-bit char. However, in game programming, you can get away with just using a BOOST_STATIC_ASSERT(sizeof(int8_t) == 1), which will fail to compile for those platforms and then otherwise ignoring non 8-bit chars. This is a different story in non-game programming disciplines.
Quote:Original post by Antheus
Unless you assume Intel architecture, you should rely on memcpy to copy data to/from buffer.


That's not an issue as long as you use fstream's read()/write() directly on your data members.
Hmm. so in the case of a system with 32-bit chars, does it even work to use read() and write() with uint8_t? Such as:
myStream.write( myBuffer, sizeof( uint8_t ) );

I'm guessing no, since the definition of write is:
ostream& write( const char* buffer, streamsize num );

where num is the number of bytes in the buffer. And in this case, a byte would be at least 32 bits. So num would end up being 0.25... can streamsize be a floating point value?


This discussion has raised a new question for me, though perhaps it belongs on the OpenGL forum, but it concerns this topic so I'll ask it here for now:

One of the things I want to store in my binary files are OpenGL color values. These eventually get mapped between 0.0 (zero intensity) and 1.0 (full intensity) for each of the color's components. Now I don't want to use floats directly, due to there not being a reliable way to serialize floats across platforms as discussed earlier. But this isn't a problem as the glColor function can alternatively take bytes, shorts, and ints in addition to floats.

So if I pass unsigned ints into glColor, the documentation states that the largest-representable value gets mapped to 1.0 and the smallest to 0.0. But what does OpenGL consider the "largest-representable value"?

For example, if I call:
boost::uint8_t red = 255;boost::uint8_t green = 255;boost::uint8_t blue = 255;boost::uint8_t alpha = 255;glColor4ui( red, green, blue, alpha );

Will OpenGL know that the "largest-representable value" of these uin8_t is 255 and map the components to 1.0 accordingly? Or will it assume the "largest-represtable value" is that of an usigned 32-bit int on a platform in which that is the default? The function prototype looks like:
void glColor4ui( GLuint red, GLuint green, GLuint blue, GLuint alpha )

simply taking in GLuints... but what are these GLuints? Are they cross-platform?

I could just use bytes with glColor4ub() if I want a max of 255, but would that still work on the 32-bit byte platform mentioned earlier?
see. They are just typedefs. So far I know you must handle the portability and endianess of the types you use yourself.

I really can't belive there isn't a standar library for portable serialization... there gotta be one in boost.
[size="2"]I like the Walrus best.
Quote:Original post by BeauMN
Hmm. so in the case of a system with 32-bit chars, does it even work to use read() and write() with uint8_t? Such as:
myStream.write( myBuffer, sizeof( uint8_t ) );


That will fail to compile because of no definition for uint8_t.

Quote:So num would end up being 0.25... can streamsize be a floating point value?

No.

Quote:
simply taking in GLuints... but what are these GLuints? Are they cross-platform?

Is a GL typedef for four byte unsigned integers that can vary by platform. If you want the maximum value you can use std::numeric_limits<GLuint>::max().

Quote:I could just use bytes with glColor4ub() if I want a max of 255, but would that still work on the 32-bit byte platform mentioned earlier?

I wouldn't worry about 32-bit byte platforms if you're doing game programming.

This topic is closed to new replies.

Advertisement