Optimizing memory layout (and reducing load times)

Started by
18 comments, last by Zipster 6 years, 10 months ago

This does assume some consistency across platforms, though

Yeah, that's exactly what wasted our team about a month of coder time trying to fix up. Not only do you have to consider type sizes, but you have to consider alignment, and padding. And changing compiler defaults. And new platforms.

Take a simple struct of `int a; char b; vector3 c;`. This will give you a single unambiguous CRC hash. But, given any arbitrary platform or compiler setting, do you really know what sizeof(a), offsetof(b), or offsetof( c ) are?

You can smash some of this with #pragma pack. Just hope you don't have any objects that have picky alignment requirements.

Actually, this is where I'm feeling moderately smart right now as my solution is to offsetof()-map each member of a serialized struct during initialization, so even if the alignment changes, I can still revert to reading data one field at a time. Eg, the following is a complete set of required information necessary to map any ordering to any other ordering and it only assumes the programmer not screw up calling one single macro:

// the order of field (f1, f2 and f3) determines the order in which they are written to disk
DEF_SERIALIZED_TYPE(mystruct_t, f1, f2, f3);

The macro uses offsetof() on each member to determine the position of the field for this architecture and compiler.

It also generates a hash for the entire structure, which encodes field types and optionally order. This hash is then written to disk in the header of the data file. It would make sense to split this hash into separate type and field order monikers, though.

When data is loaded, the hash is used to determine a version mismatch. If all fields are accounted for, but do not match the order in which they are stored on disk, local packing data extracted from offsetof() is used to determine ranges that are contiguous and otherwise, as the order on the disk is known, map each field to a new offset.

This pretty much automatically takes care of field ordering issues.

As for type mismatches: my solution is to avoid using stuff like size_t in structures that require serialization and instead convert architecture-dependent types to something more robust, eg int32.

For extended types that have private members or require extra care, I'm using specialized callbacks that can access and properly encode/decode the data.

Advertisement

If you're not encoding the offset data into your hash, how will you know that the data on-disk doesn't match the in-memory layout? I'm not talking about changing the ordering of anything, just literally saying that the same objects can be at different offsets even with the order remaining invariant.

If you're not encoding the offset data into your hash, how will you know that the data on-disk doesn't match the in-memory layout? I'm not talking about changing the ordering of anything, just literally saying that the same objects can be at different offsets even with the order remaining invariant.

It would make sense to split this hash into separate type and field order monikers, though.

My mistake - the above statement was meant to infer that both the order and offsets are hashed.

If there's a hash mismatch, though, then the only safe thing to do is to read all the data one field at a time.

That being said, I'm not in a position to surmise how much this is really an issue IF:

a) the manual ordering of fields in a given struct is sensible
b) all data is either stored in packed arrays or is aligned to start with

Yeah, that's exactly what wasted our team about a month of coder time trying to fix up. Not only do you have to consider type sizes, but you have to consider alignment, and padding. And changing compiler defaults. And new platforms.
Given any arbitrary platform or compiler setting, do you really know what sizeof(a), offsetof(b), or offsetof( c ) are?

I advocate always accompanying the definition of these structures with static assertions that document the assumptions that you're making about how the compiler aligns and pads the structure. If your assumptions are wrong, the code stops compiling. Those assertions then also become the file format specification ("the code is the documentation") for the tool that produces your data files. We were supporting something like 6 platforms at the time pretty easily. Most of them actually had the same packing rules :)

An alternative would be to define your data structures in a different language altogether (some kind of JSON description, etc) and then use that information to automatically generate serialization/deserialization and/or struct definitions (+static assertions) for particular compilers.

The trickiest bit fe hit was that we wanted the 32-bit and 64-bit Windows builds to load the same data files, so we used some template type selection stuff like pad64<Foo*>::type instead of Foo* to force a consistent pointer size on both builds inside these structures, and used the same static-assertions across both builds to ensure they matched... but using pointers in these kinds of structures was pretty rare anyway; we usually used offsets that acted like pointers:


template<class T, class Y=s32> struct Offset
{
	const T* NullablePtr() const { return offset ? Ptr() : 0; }
	      T* NullablePtr()       { return offset ? Ptr() : 0; }
	const T* Ptr() const { return (T*)(((u8*)&offset) + offset); }
	      T* Ptr()       { return (T*)(((u8*)&offset) + offset); }
	const T* operator->() const { return Ptr(); }
	      T* operator->()       { return Ptr(); }
	const T& operator *() const { return *Ptr(); }
	      T& operator *()       { return *Ptr(); }
	static uint DataSize() { return sizeof(T); }
	bool operator!() const { return !offset; }
	Offset& operator=( void* ptr ) { offset = ptr ? (Y)((u8*)ptr - (u8*)&offset) : 0; return *this; }
private:
	Y offset;
};

I'm in a position to tell you that it's a big deal if you expect to run portably. This was an actual problem I've dealt with in commercial game code.

The problem with using a hash is that it's a hash. Hashes are good at telling you that 2 things are different, but it can't tell you how they differ. How do you read in a field at a time, if the only information you have about the offset of each field within that data is a single hash, which doesn't match the hash for your current in-memory layout? If I load in that struct above - int a; char b; vector3 c; - which byte in the file is 'b' - is it byte 5, byte 8, byte 9, or byte 16? Even changing from int to uint32_t doesn't necessarily save you here.

The 'solution' I've seen to this problem, is that if the hash doesn't match, you need to rebuild the data for your platform. Each platform has its own set of data. It's fast to load. It takes up a lot of space and time. Developers get angry when they pull an innocuous code change and find out they need to run the data tools pipeline before they can test their local changes. You win some, you lose some. :)

I advocate always accompanying the definition of these structures with static assertions that document the assumptions that you're making about how the compiler aligns and pads the structure. If your assumptions are wrong, the code stops compiling.


The experience I have is that there were no assumptions about the padding and alignment, because the person who wrote the data serialiser was never the person writing or editing the structures being serialised. The context is usually something like "this object gets serialised, so only use serialisable types, and the serialiser will do the rest".

The trickiest bit fe hit was that we wanted the 32-bit and 64-bit Windows builds to load the same data files, so we used some template type selection stuff like pad64<Foo*>::type instead of Foo* to force a consistent pointer size on both builds inside these structures


Yeah, we had something similar... it broke most of the time when a pointer was hidden inside a struct and nobody noticed... or when someone had typedefed a pointer as something else so nobody spotted it was a pointer.

Occasionally people tried to do 'cute' things such as adding explicit padding - e.g. a spare u32 after a used u32, to make the next object align the same on 64bit platforms. At least until someone else adds another u32 earlier in the file later. This is one of the examples that can be avoided with static_asserts, but that's a mess, IMHO.

hm, this was quite interesting to read. so what is the actual bonus compared to using something like https://github.com/USCiLab/cereal to handle serialization? it is fast to get going, yet leaves room to optimize special classes/structs/stuffs for size or speed. it does loose JSON/XML and packed binary. to my rather uninformed self it seemed so fast and efficient that rolling my own wasn't worth the effort. but maybe i am missing out and should? irreversible, what was your reason to write your own and not go for some library?

We are doing streaming in our games as well. The problem ist not just the data itself but how they come over the desired media. Loading huge chunks of data from disk is a pain in the ass when your data isnt structured well. As I described in another post, what I do in my game engine is a combination of a proper aligned packed format that consists of 64kb chunks inside the package so reading a single chunk is exactly a full cache on the HDD and memory mapped files with threaded IO to speed things up, loading a single chunk or a range of chunks.

irreversible, what was your reason to write your own and not go for some library?

Actually it wasn't my initial impulse to write a serializer, but rather to figure out how the heck I was going to structure my code so I could minimize its memory imprint and store my data in a save file while keeping load times under control once the game world starts getting larger. Serialization happened to be a very tightly knit extension of that problem.

The other reason has to do with what most people here would probably call masochism - I try to write as much as possible of my own stuff. I don't recommend it if you're in a hurry to get things done, but writing these kinds of subsystems has kept me quite busy in terms of learning how to structure code in my head and trying to figure out modular, easily extensible solutions. A generic serializer is ultimately an exercise in writing robust templated code and also partially extends to template metaprogramming. The fun part is figuring out how to get away with automating as much of that code as possible. So, in short, it's just good exercise, even if the end result isn't some industry-strength hyper-generic super-optimized multi-platform cross-compiler beast. Besides, without worrying to boot about alignment it's not that big of a task - it took me less than two work days to get the serializer to a state where I can put it into practical use. It would have taken me the same amount of time to look for, assess and interface with an external library, which I would have had far less control over.

Why make the on-disk layouts exactly match the in-memory layouts? What are you gaining by bulk-copying entire structures instead of individual fields? Why place the burden of managing and maintaining the subtleties of portability and implementation-defined behavior (which are easy to overlook and often difficult to debug) on the developer when they can be largely avoided? If you keep your on-disk layouts compact and express all data in terms of simple primitive types, then you can copy/assign one field at a time and the compiler will generate the correct code for you in all cases.

That being said, once you're at a point you can profile and determine if this is an actual bottleneck, then you can go back to exposing these responsibilities to the developer if the performance improvements are worth it. But I wouldn't start down this road initially, when you don't have any actual performance data to support the up-front developer cost and additional overhead.

For dealing with data schema that could be variable at run-time for whatever reason (i.e. savegames), I've found it's largely an issue of finding and using an appropriate model. JSON for instance works quite well for this because you can easily query and manipulate in-memory objects and utilize that meta-information to transform the data how your please, especially when working from a holistic view of the data. I've had to work with serialization systems that tried to manage versioned data using static structures and limited/partial access to the entire dataset, and it was an absolute nightmare.

This topic is closed to new replies.

Advertisement