File Formats

Started by
5 comments, last by alnite 16 years, 6 months ago
I need a way to write files that will contain my mesh data. The mesh data its self is just an example. In c++ there is serialization however binary files are said to have some compatibility issues across platforms so is there any other way besides this and a basic text file to hold this information? What I am trying to find out is what is the best way to do this. For example if several meshes are imported then what would be a good way of saving them into a single file. (like a mesh package commonly used for most games)
Advertisement
Plain text is more portable than binary serialization, but also requires more effort to load and save.

Boost has a serialization library, but I've never used it, so I can't vouch for or against it.

One approach you might consider would be to manually write out all your data to a binary format and manually read it in, being sure to normalize it (i.e. byte-swapping) when doing so.
Quote:Original post by cistar
I need a way to write files that will contain my mesh data. The mesh data its self is just an example. In c++ there is serialization however binary files are said to have some compatibility issues across platforms so is there any other way besides this and a basic text file to hold this information? What I am trying to find out is what is the best way to do this.

For example if several meshes are imported then what would be a good way of saving them into a single file. (like a mesh package commonly used for most games)
The compatibility issues you speak of are little endian vs big endian. I think (Although I'm really not sure) that the mac is the only big endian consumer machine, and in any case - it's easy to change your packing / convertor program to output either endian type, or to compile and run the packer / convertor on the other machine type.

Personally, I'd just use a binary format and a pack file-like format (I use my own file archive which is just a standard pack file with compression).
Well someone had said something about it wouldn't work on a 64 bit or if you change what processor it was on the binary would no longer work. However if it only doesn't work on a Mac then its usable for most windows situations I would think.
Quote:Original post by cistar
Well someone had said something about it wouldn't work on a 64 bit or if you change what processor it was on the binary would no longer work. However if it only doesn't work on a Mac then its usable for most windows situations I would think.
It depends. If you read in types like "int" and "long", then you're likely to have issues. If your file format has fixed sized types like 32-bit signed int, 16-bit unsigned int, byte, etc, you'll be fine. But you really should be using fixed sized types anyway.
1. Endian-ness. Steve's already covered. Older Macs (not sure about the newer PPC based ones), PS3, Xbox360, GameCube, Wii, are all big-endian. PC, PS2, Xbox, are little-endian. Usual solution to this is to simply have two versions of your binary data: big-endian and little-endian. Converting endian-ness at load time is something you might want to do to simplify development, but the final release media should be in the correct endian for the platform in the binary.


2. Type sizes. Steve's touched on this. How many bytes does an 'int' or a 'long' take? The answer is it depends on the compiler for the platform. So for each platform make some known size types (e.g. int32, uint64) and ensure that ALL on disk binary media uses types that have known sizes.


3. 64-bit. 64-bit issues shouldn't really affect or be affected by your file format. The biggest issue with 64-bit is how your app uses pointers in memory; if you're ever casting a pointer to or from some other type then your code is probably broken for 64-bit. If you store pointers in your file format that will be fixed up and used "in place" on load, then they should be 64-bit (see above about fixed size types - make a fixed size pointer type where the upper 32-bits are ignored on other platforms).


4a. Alignment and padding. Some CPUs have strict rules regarding the address that a certain type should be at, most commonly the address should be a multiple of the size of the type in bytes. Compilers for platforms with those types insert padding into structures to align the start address correctly. In addition, some compilers will insert padding at the end of a structure to ensure that all elements in arrays of that structure are also correctly aligned.

4b. An example:
struct Blah{    unsigned char a;    float b;    unsigned char c;};

Compilers for many platforms will insert padding between members a and b so that the float is on an address that is a multiple of sizeof(float) and padding at the end of the struct to ensure the whole struct is a multiple of sizeof(float). So the structure really actually ends up as:
struct Blah{    unsigned char a;    unsigned char hidden_padding1[3];    float b;    unsigned char c;    unsigned char hidden_padding2[3];};


4c. The problem is, different CPUs have different rules, and different compilers have different options regarding structure alignment/padding.

4d. The solution is for any on-disk structure, sort all structure members in decreasing order of size, use known size types (int8 etc), insert your own padding and alignment into the structure to make it conform to the rules for ALL the platforms you're targetting. #pragma pack and similar can be useful, but don't use them unless you've done the above first (compilers can generate quite hideous code for careless packing).


5. All of the above said, for graphics data, in the commercial games world it's actually more common to have a final binary file for each platform because there are usually too many differences between the GPUs (indexing isn't supported on platform X, platform Y has a really cool texture compression format, platform Z doesn't support signed packed DWORDs). Of course almost all non-graphics, non-sound data should be completely platform neutral.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

You, of course, can just pick an endian format and use it for your file. That way it is independent of the processor. You need to create helper functions that put each type into a byte array and vice versa. This requires you to convert everything to byte array, and not to rely on standard library to save your data.

For example:
struct x {   char a;   short b;   int c;};

Using some standard library:
x my_x;fwrite( &my_x, sizeof(x), 1, fp );

This can cause packing issues depending on what compilers you are using and what machine you are running it on.

Using your own:
struct x {   char a;   short b;   int c;   char* get_bytes() {      // use BIG ENDIAN      char* buffer = new char[7];      buffer[0] = a;      buffer[1] = (b >> 8) & 0xff;      buffer[2] = (b & 0xff);      buffer[3] = (c >> 24) & 0xff;      buffer[4] = (c >> 16) & 0xff;      buffer[5] = (c >> 8) & 0xff;      buffer[6] = (c & 0xff);   }   void read_bytes(const char* buffer ) {      if ( (buffer != 0) && (sizeof(buffer) >= 7) ) {         a = buffer[0];         b = (buffer[1] << 8) | buffer[2];         c = (buffer[3] << 24) | (buffer[4] << 16) | (buffer[5] << 8) | buffer[6];      }   }}x my_x;char* data = my_x.get_bytes();fwrite( data, sizeof(data), 1, fp);delete [] data;


Man, my C/C++ skill is getting rusty. If someone spotted a problem in the above example, please let me know.

This topic is closed to new replies.

Advertisement