File IO question

Started by
8 comments, last by Zahlman 14 years, 7 months ago
I've always wondered about this and never really asked about it, and have never seen any mention of it in any articles or books, so here goes. Does the order of a structure in code affect how it's read from a file, assuming a broad fread call is made. For example:


typedef struct
{
 int id;
 float data;
} mystruct;

mystruct* m;
fread(m, 1, sizeof(mystruct), f);

// will the above result in the same as below:
typedef struct
{
 float data;
 int id;
} mystruct;

mystuct* m;
fread(m, 1, sizeof(mystruct), f);
Fairly simple, I'm assuming, but I've never really asked about it.
Advertisement
The order of a structure does not affect how the file is being read, the order of the variables in a structure of course do effect which variables get which values assigned.

In your first example, the first four bytes will be used for the 'int id', and the second set of four bytes will be used for the 'float data'. In your second example, the first four bytes are for 'float data', and the last four bytes for 'int id'.

That being said, sometimes a structure combined with compiler optimization can screw up your loading/saving of data (If my memory serves me will, I though this was the problem with Bitmap files). The compiler would optimize a header by adding a couple of 'useless' bytes, this way you cannot do something like:
fread(structure, 1, sizeof(structure), f);

But you have to do something like:
fread(structure.var1, 1, sizeof(structure.var1), f);fread(structure.var2, 1, sizeof(structure.var2), f); //etc


Keep that in mind when encountering strange problems with loading/saving files ;)
It matters. Writing a full object will have the members in the order that they're declared, plus you may have some extra bytes of compiler-added padding which will vary by platform, compiler, and compiler settings.

You should pretty much always serialize the members of an object individually.
This is just (logically based) speculation, but I wouldn't imagine you can just make up a struct containing the data types you want to read and have it magically put the values where they belong. What if you have multiple floats, or multiple ints, or even a complex of nested structs? I would imagine it reads the struct in the same way it was written out. You could easily write a few tests to investigate further for yourself.
I didn't think so but I wasn't sure. I've noticed that a lot of MD2 loading examples simply call fread(header, 1, sizeof(MD2Header), fileHandle), and I assumed that with the compiler padding and what not being added I would need to see how the modeling program itself was writing the information in order to know what order to read it back.

Thanks for the responses.
I think the order is mandated by the appropriate language standard, but not the size or space between objects. So if your program did read and write a struct from and to a file then if you later recompiled that program using a different compiler(or different version of the same compiler) then you might not be able to read the data back in correctly or write it out the same. That doesn't mean you can't do what you want, but you should probably typedef your data types so that you can be sure they are the same size across compilers and to disable struct padding (see your compiler options for details. Here's visual C++'s method: clicky). Also, if you want a search-able concept, try 'serialization'.

C++: A Dialog | C++0x Features: Part1 (lambdas, auto, static_assert) , Part 2 (rvalue references) , Part 3 (decltype) | Write Games | Fix Your Timestep!

Quote:This is just (logically based) speculation, but I wouldn't imagine you can just make up a struct containing the data types you want to read and have it magically put the values where they belong. What if you have multiple floats, or multiple ints, or even a complex of nested structs?


As long as you're dealing only with non-pointer POD types, it *will* work, but it's 100% unportable.
It's technically not portable, but it's portable enough (works in enough cases) that you see it all the time. For example, the header of a .bmp file is simply a BITMAPINFOHEADER structure written out in one chunk*.

In my game, my internal model file format is just a series of POD structures written out to disk. The advantage is that it's very easy - one fwrite/fread rather than many - and it's extremely efficient. The disadvantage is that it makes versioning quite difficult (for my game, I just have an integer "version" field in the header and refuse to load anything but the most recent version).

In reality, I guess, the efficiency argument is not all that important, since you're almost always going to be limited by the speed of your hard disk, rather than the speed of your RAM/CPU and manually putting fields into the correct slot. The other problem with portability is that, for example, if your game runs on both Windows and Xbox you have a problem because everything that Windows runs on is little-endian, whereas the Xbox is big-endian. So you either have to export all your assests separately for Xbox/Windows, or you have to manually twiddle bytes around when you load on one of those platforms.

* bmp files are a little more complicated than that, because of the multiple versions that exist, but that's the idea anyway.
In highly content needy games you should always be concerned about optimizing your file io. Although it is true that the HDD is typically the slow portion of the operation of loading data and not the CPU that doesn't mean there arent better things for the CPU to be doing.

In an ideal world file reads will occur asyncronously (especially due to the relative slowness of io devices) and the cpu/gpu will be busy doing other processing during these reads. In an absolute ideal world the data you read in is ready to go with little or no processing at all after loading. This includes avoiding virtualization (or specifically handling it) and avoiding pointers by using offsets which may be fixed up after loading at a minimal cost.

This may mean that you do save out your assets in multiple formats for platforms in little/big endian. It may mean that you do everything you can to keep the layout of your classes identical to the saved out binary data. It is not terribly uncommon to see explicit packing in headers to make it more obvious such as:

struct StructName{  int  m_nInt;  char m_cChar;  char __cPadding[3];}


As mentioned things can vary a little bit between compilers and platforms being compiled for. Especially in common libraries great lengths are taken to know all of the differences and ensure the structure's size and alignment always be the same.
// Full Sail graduate with a passion for games// This post in no way indicates my being awake when writing it
Don't try to rely on these kinds of things. It's just not worth it. It's 2009 now. Processors are fast - really, incredibly fast. Hard drives are also much faster than they used to be, but there's really no comparison.

This topic is closed to new replies.

Advertisement