This does assume some consistency across platforms, thoughYeah, that's exactly what wasted our team about a month of coder time trying to fix up. Not only do you have to consider type sizes, but you have to consider alignment, and padding. And changing compiler defaults. And new platforms.
Take a simple struct of `int a; char b; vector3 c;`. This will give you a single unambiguous CRC hash. But, given any arbitrary platform or compiler setting, do you really know what sizeof(a), offsetof(b), or offsetof( c ) are?
You can smash some of this with #pragma pack. Just hope you don't have any objects that have picky alignment requirements.
Actually, this is where I'm feeling moderately smart right now as my solution is to offsetof()-map each member of a serialized struct during initialization, so even if the alignment changes, I can still revert to reading data one field at a time. Eg, the following is a complete set of required information necessary to map any ordering to any other ordering and it only assumes the programmer not screw up calling one single macro:
// the order of field (f1, f2 and f3) determines the order in which they are written to disk
DEF_SERIALIZED_TYPE(mystruct_t, f1, f2, f3);
The macro uses offsetof() on each member to determine the position of the field for this architecture and compiler.
It also generates a hash for the entire structure, which encodes field types and optionally order. This hash is then written to disk in the header of the data file. It would make sense to split this hash into separate type and field order monikers, though.
When data is loaded, the hash is used to determine a version mismatch. If all fields are accounted for, but do not match the order in which they are stored on disk, local packing data extracted from offsetof() is used to determine ranges that are contiguous and otherwise, as the order on the disk is known, map each field to a new offset.
This pretty much automatically takes care of field ordering issues.
As for type mismatches: my solution is to avoid using stuff like size_t in structures that require serialization and instead convert architecture-dependent types to something more robust, eg int32.
For extended types that have private members or require extra care, I'm using specialized callbacks that can access and properly encode/decode the data.