Low level serialisation strategies

Started by
18 comments, last by Kylotan 7 years, 7 months ago

Often we don't even fix up pointers on load. I've a template library containing pointer-as-relative-offset, pointer-as-absolute-offset, fixed array, fixed hash table, fixed string (with optional length prepended to the chars, and the hash appearing next to the offset to the chars), etc. Instead of doing a pass over the data to deserialize these, they're just left as is, and pointers are computed on the fly in operator->, etc. This can be a big win where you know a 2byte offset is enough, but a pointer would be 8bytes.


This sounds fascinating but I have no idea how that would work. Say you have this object:


class Xyz
{
    std::string name;
    u32 value1;
};

What does the writing code look like? Do you write each field out individually, telling it that you want 'name' to be written as a fixed string? Because I'm encountering serialisation strategies that basically mandate removal of that string from the class, replacement with a string hash or some other POD type, then an fwrite (or equivalent).

(Or, worse, I've seen a system that expected to read objects back in a single chunk, but required each field to be written individually, so that it could mangle pointers on the way out. This kind of gives you the worst of both worlds. But hey, fast loads and you can still use pointers!)

Advertisement

I just finished writing an article on this exact topic - its currently pending approval - hopefully some of the members interested in this topic will participate in the peer review!

I just skimmed it - what you have there is fairly close to what I would use myself. The pros are that it's all explicit; the cons are that you're having to serialise each field individually, and it doesn't really do anything to handle pointers or weak references, which is a big part of fast serialisation for toolchains. (e.g. The many-to-many relationship between models and textures - you don't want each model to write out its own copy of each texture.)

1. Because 99% of the objects in the engines I've worked with are not Plain Old Data
2. Because the data is padded or aligned differently on different platforms

You only use this for plain old data. If you're building an engine system and want to be able to load it form disc with a memcpy, then you're able to make the choice to design it as POD.
The thing about KISS, is that it's hindered by complex engineering, so you end up with a war in your code-base when you try to mix the brutally simple with the deceivingly complex :)
For 2, your build system needs to generate binaries for each platform - meaning your can't load your PS3 assets on a PS4. Your serializer can know some things about the platform it's generating for, such as its endianness and the padding rules of its compiler.
This is why we only use it for the kind of static assets that you ship on a disc, and not more portable/flexible/changing things, such as save-games or user-facing editable content.

I can appreciate the hypothetical speed benefits of this but given how error-prone they are, I wonder whether there is any real benefit.

Well it depends what you're comparing it to. There's a lot of games that peg a CPU core at 100% usage for 30 seconds when loading a level :D
We handle data structures of any complexity, references between different assets (model loads a material loads a texture, etc) and do next to no CPU work during loading -- it's all just waiting for the CPU to map our data into address-space. For graphics assets, we support loading an asset as several "blobs" which can be allocated in different ways, which allows us to stream vertex/pixel data directly into GPU memory instead of, e.g. loading an image file into memory, deserializing it, then creating the GPU resource, and then copying the data across.

Often we don't even fix up pointers on load. I've a template library containing pointer-as-relative-offset, pointer-as-absolute-offset, fixed array, fixed hash table, fixed string (with optional length prepended to the chars, and the hash appearing next to the offset to the chars), etc. Instead of doing a pass over the data to deserialize these, they're just left as is, and pointers are computed on the fly in operator->, etc. This can be a big win where you know a 2byte offset is enough, but a pointer would be 8bytes.

This sounds fascinating but I have no idea how that would work. Say you have this object:
class Xyz
{
    std::string name;
    u32 value1;
};
What does the writing code look like? Do you write each field out individually, telling it that you want 'name' to be written as a fixed string? Because I'm encountering serialisation strategies that basically mandate removal of that string from the class, replacement with a string hash or some other POD type, then an fwrite (or equivalent).

(Or, worse, I've seen a system that expected to read objects back in a single chunk, but required each field to be written individually, so that it could mangle pointers on the way out. This kind of gives you the worst of both worlds. But hey, fast loads and you can still use pointers!)

If I'm using this deserialization technique, I'm almost certainly using it for some kind of immutable asset data. It's very rare to have mutable assets. That means that std::string is overkill. Though it's hard to think of a good use for std::string in any part of a game engine IMHO :wink:

For an example system, I'd write some C++ code of the data that I need, and the algorithms I need to consume it:
struct Bar
{
  u32 value1;
  u32 value2;
};
struct Foo
{
  StringOffset name;
  Offset<Bar> bar;
};
struct MyBlob
{
  List<Foo> foos;
};

void Test( const MyBlob& blob )
{
  for( uint i=0, end=blob.things.count; ++i )
  {
    const Foo& foo = blobs.foos[i];
    const Bar& bar = *foo.bar;
    const char* name = foo.name;
    printf( "name %s, %d, %d", name, bar.value1, foo.bar->value2 );
  }
}
I could write a better serialization system for the C# tool / generator side, but it's all explicit ATM:
class FooBar
{
  public string name;
  public int v1, v2;
};
...
List<FooBar> data = ...
using(var chunk0 = new MemoryStream())
using(var w = new BinaryWriter(chunk0))
{
  StringTable strings = new StringTable(StringTable.Encoding.Pascal, true);

  //struct MyBlob
  w.Write32(data.Count);//MyBlob.foos.count
  long[] tocBars = new long[d.Count];
  for( int i=0, end=data.Count; i!=end; ++i )
  {
    //struct Foo
    strings.WriteOffset(w, data[i].name); // Foo.name
    tocBars[i] = w.WriteTemp32();         // Foo.bar
  }
  for( int i=0, end=d.Count; i!=end; ++i )
  {
    w.OverwriteTemp32(tocBars[i], w.RelativeOffset(tocBars[i])); // Fix up Foo.bar
    //struct Bar
    w.Write32(data.Count);// Bar.value1
    w.Write32(data.Count);// Bar.value2
  }
  strings.WriteChars(w);
}
You only use this for plain old data. If you're building an engine system and want to be able to load it form disc with a memcpy, then you're able to make the choice to design it as POD.

I guess I find it hard to imagine a situation where I could do that and still find the code convenient to work with. Some objects are intrinsically hierarchical, and some have varying length contents; trying to flatten all that just seems to move the complexity out of the serialisation and into everywhere else in the engine.

your build system needs to generate binaries for each platform

Sure, but it's easier said than done, especially when you have pointers and you're going for a memcpy/fwrite type of approach.

Re: the stringoffset stuff - what I understand of it looks much like some of the pointer serialisation stuff I've dealt with in the past, but the code is essentially back into writing out each field one by one, right? Which isn't a bad thing, just that I thought you were trying to avoid that. And the reading code would appear to have to do the same.

I guess I find it hard to imagine a situation where I could [use POD] and still find the code convenient to work with. Some objects are intrinsically hierarchical, and some have varying length contents; trying to flatten all that just seems to move the complexity out of the serialisation and into everywhere else in the engine.
Sure, but it's easier said than done, especially when you have pointers and you're going for a memcpy/fwrite type of approach.

In the example I gave, I deliberately used hierarchy and variable-length data :)
The MyBlob struct is of varying size - List<Foo> is a u32 count, followed by that many Foo instances. Foo contains an Offset<Bar> (which is a u32 that acts like a pointer to a Bar) - a has a/owns a hierarchical relationship.
In my example, all the complexity is in my manually written C# serialization routine - the C++ data structures themselves and the algorithms that operate on them are extremely clean.

Sure, you can't put a mutable, complex, under-unspecified std::map into one of these structures, but we can put our own immutable map class into one just fine.

Re: the stringoffset stuff - what I understand of it looks much like some of the pointer serialisation stuff I've dealt with in the past, but the code is essentially back into writing out each field one by one, right? Which isn't a bad thing, just that I thought you were trying to avoid that. And the reading code would appear to have to do the same.

There is no reading code -- you declare those structures and then you use them without a deserialization step. If they're in memory, they're usable - just cast the pointer to their data to the right type and you're done. You can memory map an asset file and start using them immediately, without even having loaded them into RAM first - the OS would page fault and load them on demand in that case!
[edit] See Offset, List, Array, StringOffset, Address in this file.

Yes, the serialization code in my C# tool writes out each field one by one, though that's because I don't have a serialization framework - it's just a plain old binary writer. I planned on writing one that would automatically go from C# structs to bytes in a files, just how the C++ "reading" side doesn't need to implement any deserialization code. This would mean that all I would have to do is declare the same structure layout in my C++ engine and my C# tools and that would be that.... but I actually found that I like using the plain old binary writer so far so it hasn't been a priority :D
i.e. If you're not a fan of my messy C# serialization code, it is a solvable problem.

The C# code that I posted will work across all our platforms, and does have "pointers" embedded in it (as offsets that are never deserialized). It is easy and done! We don't just do this to get better loading times, but also because it actually is good for our productivity.

There is no reading code -- you declare those structures and then you use them without a deserialization step.

Right, I guess I just can't mentally work out how that operates given the code that you show, because (for example) the writing code has appeared to add a stringtable at the end that didn't exist prior to serialisation. No doubt there's something tied up in the StringOffset and Offset types that does the magic!

Some code I've worked with was exactly like that - per-field serialization, load-blob-and-cast deserialization. And I think that was the approach that caused me the most problems, mostly because there was no real attempt at making the types work well with it, just a mixture of POD types (which work fine, when you stick to explicitly-sized types), structs (which align differently on different platforms, breaking stuff), and pointers (which change size based on architecture, breaking stuff, and requiring a fix-up stage, and some sort of composition vs. association strategy). I expect that can be mitigated a lot if the pointers are all replaced with smart pointers (and ideally ones that understand association vs composition).

Personally I wouldn't really want to change a lot of the types involved in the data just to facilitate quicker serialisation; but I can see that it's definitely something people can make a case for (and especially for the GPU data like you said).

So there are two basic types of deserialization. One is the hard case that you're describing: where the final in-memory representation of data is complex, richly-typed, and dynamic.

That is a separate problem.

The other type of deserialization is for largely-static data (in terms of layout if not actual contents) that can be baked into a known layout at build time (or serialization time, more accurately). That's the sort I referred to and I believe Hodgman is talking about as well.

For dynamic or richly-typed data, your best bet is to hoist a statically laid-out copy of the data into memory and then post-process it into the runtime format. This is how we save character and account state, for example.


What's that old saying about adding layers of indirection? ;-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

You say this is 2 separate problems, but I don't see it that way, because even "largely-static data" can have a "complex, richly-typed, and dynamic [..] in-memory representation" (forgive the quote-juggling). If I have a 3D model, with vertices/indices/materials/shaders/textures/animations, some of which are obviously composed of the others, others merely associating with the others, that's all static data (as far as running the game is concerned), but is going to have a relatively complex in-memory representation. No?

I think what I've found interesting is that some developers have exactly that - fairly complex objects in memory - that they're reading in via fread and some fancy pointer twisting. I guess that is probably facilitated by something in the build process ensuring these objects are all contiguous - or just writing them out in such a way and going back to patch up pointers later. Since I've never worked on the build pipeline I've not been exposed to these methods before; it's only after hitting a couple of deserialisation bugs that I realised this sort of thing was common.

Let me steer this towards a practical question then; for those using these low-overhead methods, what are you doing to ensure that the layout of data is exactly what you expect - endianness, platform dependent types, 32bit vs 64bit, alignment (inc. SIMD requirements), padding, etc?

Right, I guess I just can't mentally work out how that operates given the code that you show, because (for example) the writing code has appeared to add a stringtable at the end that didn't exist prior to serialisation. No doubt there's something tied up in the StringOffset and Offset types that does the magic!

Yes when serializing, I had a C# string object that basically had to become a char* into the Foo structure (or my equivalent: StringOffset). The runtime Foo structure has a StringOffset, which is basically a u32 -- with 0 being "null" and any other value being "there are some char's at (((char*)this) + *this)". The serialization code initially writes a zero for this field into the stream, but retains the stream cursor inside the StringTable strings helper object.
After writing out all the structures, strings.WriteChars(w); writes the actual char contents of the C# string objects to the stream, at patches up all the offsets (which were written as placeholder 0's earlier) to point to the start of each string's char data. The runtime isn't aware of the existence of a string table, it just knows that it has an offset to some chars, and the tool has made sure that when the runtime follows this offset it will find some zero-terminated chars.

Likewise, an array of Foo structures is written out initially, each of which must contain a link to a Bar structure. The Bar structures can't be written out on demand, as we're in the process of writing a contiguous Foo array. So, the Foo::bar member is written out with a placeholder value of zero, but the stream cursor is retained in tocBars (with tocBars = w.WriteTemp32();). After writing the contiguous Foo array, all the Bar structures are written out, and the Foo::bar members are patched up with the appropriate offsets.
The string table could be written before the Bar structures, making the two arrays non-contiguous with each other (i.e. strings.WriteChars(w); could be moved in between the two for loops) and the runtime would continue working unaffected, since as long as the offsets are correct, the runtime will interpret the data correctly.
This code is basically acting as a hand-written malloc, where the order that they go into the stream is deciding their memory address at runtime... which can be a good thing, as the serialization code can be written with knowledge of the data access patterns, e.g. tweaked to place "hot" data close together.

You say this is 2 separate problems, but I don't see it that way, because even "largely-static data" can have a "complex, richly-typed, and dynamic [..] in-memory representation" (forgive the quote-juggling). If I have a 3D model, with vertices/indices/materials/shaders/textures/animations, some of which are obviously composed of the others, others merely associating with the others, that's all static data (as far as running the game is concerned), but is going to have a relatively complex in-memory representation. No?

I believe when ApochPiQ says "richly typed and dynamic", he means, regular, mutable C++ objects full of pointers and vtables and moving parts and things that grow and shrink and change. This stuff obviously isn't applicable to a static data format, as typically the whole file is loaded with one (or a handful of) malloc's, so moving parts aren't very feasible. That said, I know some middleware that contains a nice big blob of 0x00000000 values in their files (which compress to nothing), so that after your engine loads their data file, they've already got a memory area that they can give to their internal allocator to placement-new complex C++ objects within.

As to your example though: an asset that contains vertices/indices/materials/shaders/textures/animations is static, and though it can have a complex in-memory representation, it is simple to handle such complex formats using this low-level serialization technique. That kind of asset data is the perfect candidate for this kind of system.

Let me steer this towards a practical question then; for those using these low-overhead methods, what are you doing to ensure that the layout of data is exactly what you expect - endianness, platform dependent types, 32bit vs 64bit, alignment (inc. SIMD requirements), padding, etc?

There's a whole load of approaches.

  • Do it manually, like I posted earlier. This is actually way simpler than it sounds. For each field in a struct, you have a matching Write call. You know the padding/alignment rules and explicitly account for them (Write the invisible padding fields when required). When you call something like Write32 (which matches u32 member;) it can internally endian swap if required (hint: it's not, the PPC consoles are dead :) ). Same for pointers, it can write 32 or 64 bits depending on the platform. If you're writing an asset that will be used on Win32 and Win64 (hint: you're not, nobody supports Win32 any more), then you can just write 8 byte pointers and use PadTo64<T*>::Type m_ptr; instead of T* m_ptr; in your structures and suffer the waste in your Win32 build. In my case though, I prefer to not serialize pointers if possible, and leave them as offsets. Include static_assert(sizeof(T)==42) in the runtime and assert(writer.cursor == cursor_at_start_of_struct + 42) in the tools to act as reminders to fix both code-bases when the data format changes.
  • Use the metadata from your compiler. If you're compiling your code with clang, then also use clang to parse your code and generate a description of your structures automatically. Write a serialization library that can automagically write data into the memory format that it's discovered by parsing your code.
  • Use the metadata from your binding system. If you've got your structures bound to a scripting language like Lua, etc, then you've probably already annotated your structures somehow, as a DIY reflection system for C++. Give that binding data to a serialization library that can automagically write data into the right memory format.
  • Don't manually write the C++ structures at all. Author your structures in a different data definition language / schema, and have it generate the C++ headers that will be used by the runtime, and also generate the serialization code for your tools.

You know the padding/alignment rules and explicitly account for them (Write the invisible padding fields when required)

This requires a bit more insight from all developers than I have seen demonstrated in practice. :) I've seen where people added just enough padding to shut up the asserts, without being aware that the location of the padding was important. Their new feature still worked, as did any feature relying on data earlier in the struct, just not features relying on data that came after it. (It was remembering exactly the sort of problem that inspired me to start this thread.)

Obviously there's no cure for coders that don't stop to wonder why these asserts exist at all, but it is still pretty brittle.

Author your structures in a different data definition language / schema

I guess that brings us back to Flatbuffers. :)

This topic is closed to new replies.

Advertisement