Sign in to follow this  
Kylotan

Unity Low level serialisation strategies

Recommended Posts

I've been surprised to see that quite a few developers are still using serialisation strategies that are equivalent to fread/fwriting structs. Sometimes they do fancy things to change pointers into offsets at save time and then fix them up at load time, but the emphasis is still on minimising memory allocations and being able to write directly into the final structure with no extra copying. As someone out in Unity/C# land for a few years, this came as a surprise to me when I got back to working with C++ code.

 

My main question (especially for people who've worked on shipped games) is - do you see this often? Or are you using safer (and easier to debug) methods, whether a full serialisation system (e.g. with each field getting read or written individually), or a 3rd party serialisation system like Protocol Buffers, FlatBuffers, Capt'n Proto, etc? The latter seem to have their own limitations, such as including the schema in the data being transmitted, or expecting you to use their generated data structures rather than working from your own. Are people optimising for debuggability, or deserialisation speed, or size on disk, or compatibility with 3rd party tools, or...?

Share this post


Link to post
Share on other sites

I use a simple set of functions that I hand-wrote in C for writing all kinds of things to a buffer called a 'message'.

 

I have ones for writing 32-bit integers, floats.. Others for writing floats as a single byte, with a min/max range specified for the value. So if I have a float that's expected to be between -1 and 1 it will go ahead and pack it down into one byte, then on the other end a corresponding read function for re-constituting the original -1 to 1 float from the single byte.

 

I also have read/write functions for quaternions, angle-axis rotation velocities, of different bit counts so that in cases where I want to dispose of some accuracy for smaller amount of data, I can. A quat, for example, can be conveyed as either 24-bits, 32-bits or 64-bits. An angle-axis velocity can be conveyed in 40 bits or 64 bits. etc..

 

I can also read/write 3D vectors at varying bitdepths, specifying min/max values to map each vector component to all the bits allowed.

 

I don't think I will ever just read/write full on precision data structures, unless there's a case where that's absolutely necessary.

Share this post


Link to post
Share on other sites
Being able to block copy something from disk directly into memory was pretty much the only thing I missed when transitioning from C++ to C#. You miss out on things like forward/backward compatibility, but the loading/saving code is so much simpler and load times are really fast. There's usually no debugging - either your entire set of structs come in perfectly, or you have the wrong endianness (which was only really relevant when building data on PC with little endian, then consoles reading that data using big endian), or the pragma pack is different.

We only used that technique for data loaded from disk, and not for network serialization. Edited by Nypyren

Share this post


Link to post
Share on other sites

My main question (especially for people who've worked on shipped games) is - do you see this often?


Yes. Every title I've shipped has ultimately relied on a fairly low-level serialization strategy, even if it's only for "release" builds.

...a full serialisation system (e.g. with each field getting read or written individually)...


Never found a need for this personally. If you have multiple fields that need to be read/written, what's the argument against doing it in a single pass?


...or a 3rd party serialisation system like Protocol Buffers, FlatBuffers, Capt'n Proto, etc? The latter seem to have their own limitations, such as including the schema in the data being transmitted, or expecting you to use their generated data structures rather than working from your own.


I've researched similar systems several times over the years. They never manage to really live up to our needs. Packing schema data into a message is totally inappropriate for realtime binary communication protocols, for instance. It's also often a no-go for shipping assets because of desires to keep the size down and the format obscure.

Are people optimising for debuggability, or deserialisation speed, or size on disk, or compatibility with 3rd party tools, or...?


All of the above, in varying combinations. Debugging is important for tools and pipelines. Speed is important for network protocols. Size is important for network traffic as well as disk storage. Sometimes we need to interop with things which usually means foregoing binary serialization and just using an interchange format.

Share this post


Link to post
Share on other sites
I haven't switched from simple serialization methods since i learnt them long ago. It's basic. It's understood. Both in c++ and c#.

You understand exactly what you are writing out. Block xfer methods are quick but you really need to be clear on what is in the block of data. As the other guys attest. To have a tried and true method you know works and try to stick with it. But there is always other ways.

I use the most basic of serialisation to write my objects out. I try to keep this simple because i like to keep it in atomic steps to see what I'm writing.

Draw back is that if you are writing out 50mbs. Having said that. It's only a problem serialising in files. Finding a specific problem element can be an issue.

Share this post


Link to post
Share on other sites
I've my own engine for my Indie game, and I contract on another game engine at the moment. Both make use of this strategy extensively for data files that are compiled by the asset pipeline (so are automatically rebuilt if the format changes).

Often we don't even fix up pointers on load. I've a template library containing pointer-as-relative-offset, pointer-as-absolute-offset, fixed array, fixed hash table, fixed string (with optional length prepended to the chars, and the hash appearing next to the offset to the chars), etc.
Instead of doing a pass over the data to deserialize these, they're just left as is, and pointers are computed on the fly in operator->, etc. This can be a big win where you know a 2byte offset is enough, but a pointer would be 8bytes.

As above, I find that this KISS solution is often less stress and time than the more over-engineered solutions.
Also as above, I don't spend too much time debugging this stuff at all, and it either works fine or breaks spectacularly. Leaving a few unnecessary offsets to strings in the data can be useful if you do have to debug something.
I usually generate my data with just a C# BinaryWriter (and extension classes to make writing things like offsets and fixed size primitives clearer), and use assertions when writing structures that the number of bytes written equals some hard coded magic number. The C++ code also contains static assertions that sizeof(struct) equals a magic number. If you upgrade a structured and forget to update these assertions, the compiler reminds you very quickly.

Save game files, user generated content, and online data tend to use more heavyweight serialisation systems/databases that can deal with version/schema changes, as these don't go through the asset compiler.

Share this post


Link to post
Share on other sites
I have to agree that a KISS strategy is best. A well-organized suit of serialization methods reads nearly as cleanly as any of the declarative frameworks like Protocol Buffers, reduces dependencies, simplifies the build process (again, protocol buffers), and is far simpler to debug. Serialization isn't rocket science, no reason to make it that way with some opaque abstraction.

Share this post


Link to post
Share on other sites

...a full serialisation system (e.g. with each field getting read or written individually)...


Never found a need for this personally. If you have multiple fields that need to be read/written, what's the argument against doing it in a single pass?


1. Because 99% of the objects in the engines I've worked with are not Plain Old Data
2. Because the data is padded or aligned differently on different platforms

Issue 1 I've seen approached with wacky pointer-mangling tricks. Then if one bit of data is wrong on the way in, everything's completely wrong. It seems to be complicated by requiring all the data to be coalesced into one contiguous chunk, or perhaps using several chunks each marked with their former location. Messy. I also have no idea if anyone ever got this to work on standard library objects.

Issue 2 I've seen approached with a variety of brittle attempts at manual padding, switching member ordering around, macro-d types that add padding depending on platform, etc. This seems to be a massive source of bugs because you don't always notice data that's getting splashed over the wrong fields, offset 4 bytes earlier (for example).

I can appreciate the hypothetical speed benefits of this but given how error-prone they are, I wonder whether there is any real benefit.

Share this post


Link to post
Share on other sites

Often we don't even fix up pointers on load. I've a template library containing pointer-as-relative-offset, pointer-as-absolute-offset, fixed array, fixed hash table, fixed string (with optional length prepended to the chars, and the hash appearing next to the offset to the chars), etc. Instead of doing a pass over the data to deserialize these, they're just left as is, and pointers are computed on the fly in operator->, etc. This can be a big win where you know a 2byte offset is enough, but a pointer would be 8bytes.


This sounds fascinating but I have no idea how that would work. Say you have this object:
 

class Xyz
{
    std::string name;
    u32 value1;
};

What does the writing code look like? Do you write each field out individually, telling it that you want 'name' to be written as a fixed string? Because I'm encountering serialisation strategies that basically mandate removal of that string from the class, replacement with a string hash or some other POD type, then an fwrite (or equivalent).

 

(Or, worse, I've seen a system that expected to read objects back in a single chunk, but required each field to be written individually, so that it could mangle pointers on the way out. This kind of gives you the worst of both worlds. But hey, fast loads and you can still use pointers!)

Share this post


Link to post
Share on other sites

I just finished writing an article on this exact topic - its currently pending approval - hopefully some of the members interested in this topic will participate in the peer review!

 

I just skimmed it - what you have there is fairly close to what I would use myself. The pros are that it's all explicit; the cons are that you're having to serialise each field individually, and it doesn't really do anything to handle pointers or weak references, which is a big part of fast serialisation for toolchains. (e.g. The many-to-many relationship between models and textures - you don't want each model to write out its own copy of each texture.)

Share this post


Link to post
Share on other sites

1. Because 99% of the objects in the engines I've worked with are not Plain Old Data
2. Because the data is padded or aligned differently on different platforms

 You only use this for plain old data. If you're building an engine system and want to be able to load it form disc with a memcpy, then you're able to make the choice to design it as POD.
The thing about KISS, is that it's hindered by complex engineering, so you end up with a war in your code-base when you try to mix the brutally simple with the deceivingly complex :)
For 2, your build system needs to generate binaries for each platform - meaning your can't load your PS3 assets on a PS4. Your serializer can know some things about the platform it's generating for, such as its endianness and the padding rules of its compiler.
This is why we only use it for the kind of static assets that you ship on a disc, and not more portable/flexible/changing things, such as save-games or user-facing editable content.
 

I can appreciate the hypothetical speed benefits of this but given how error-prone they are, I wonder whether there is any real benefit.

Well it depends what you're comparing it to. There's a lot of games that peg a CPU core at 100% usage for 30 seconds when loading a level :D
We handle data structures of any complexity, references between different assets (model loads a material loads a texture, etc) and do next to no CPU work during loading -- it's all just waiting for the CPU to map our data into address-space. For graphics assets, we support loading an asset as several "blobs" which can be allocated in different ways, which allows us to stream vertex/pixel data directly into GPU memory instead of, e.g. loading an image file into memory, deserializing it, then creating the GPU resource, and then copying the data across.
 

Often we don't even fix up pointers on load. I've a template library containing pointer-as-relative-offset, pointer-as-absolute-offset, fixed array, fixed hash table, fixed string (with optional length prepended to the chars, and the hash appearing next to the offset to the chars), etc. Instead of doing a pass over the data to deserialize these, they're just left as is, and pointers are computed on the fly in operator->, etc. This can be a big win where you know a 2byte offset is enough, but a pointer would be 8bytes.

This sounds fascinating but I have no idea how that would work. Say you have this object:
class Xyz
{
    std::string name;
    u32 value1;
};
What does the writing code look like? Do you write each field out individually, telling it that you want 'name' to be written as a fixed string? Because I'm encountering serialisation strategies that basically mandate removal of that string from the class, replacement with a string hash or some other POD type, then an fwrite (or equivalent).
 
(Or, worse, I've seen a system that expected to read objects back in a single chunk, but required each field to be written individually, so that it could mangle pointers on the way out. This kind of gives you the worst of both worlds. But hey, fast loads and you can still use pointers!)

If I'm using this deserialization technique, I'm almost certainly using it for some kind of immutable asset data. It's very rare to have mutable assets. That means that std::string is overkill. Though it's hard to think of a good use for std::string in any part of a game engine IMHO  :wink:

For an example system, I'd write some C++ code of the data that I need, and the algorithms I need to consume it:
struct Bar
{
  u32 value1;
  u32 value2;
};
struct Foo
{
  StringOffset name;
  Offset<Bar> bar;
};
struct MyBlob
{
  List<Foo> foos;
};

void Test( const MyBlob& blob )
{
  for( uint i=0, end=blob.things.count; ++i )
  {
    const Foo& foo = blobs.foos[i];
    const Bar& bar = *foo.bar;
    const char* name = foo.name;
    printf( "name %s, %d, %d", name, bar.value1, foo.bar->value2 );
  }
}
I could write a better serialization system for the C# tool / generator side, but it's all explicit ATM:
class FooBar
{
  public string name;
  public int v1, v2;
};
...
List<FooBar> data = ...
using(var chunk0 = new MemoryStream())
using(var w = new BinaryWriter(chunk0))
{
  StringTable strings = new StringTable(StringTable.Encoding.Pascal, true);

  //struct MyBlob
  w.Write32(data.Count);//MyBlob.foos.count
  long[] tocBars = new long[d.Count];
  for( int i=0, end=data.Count; i!=end; ++i )
  {
    //struct Foo
    strings.WriteOffset(w, data[i].name); // Foo.name
    tocBars[i] = w.WriteTemp32();         // Foo.bar
  }
  for( int i=0, end=d.Count; i!=end; ++i )
  {
    w.OverwriteTemp32(tocBars[i], w.RelativeOffset(tocBars[i])); // Fix up Foo.bar
    //struct Bar
    w.Write32(data.Count);// Bar.value1
    w.Write32(data.Count);// Bar.value2
  }
  strings.WriteChars(w);
}
Edited by Hodgman

Share this post


Link to post
Share on other sites
You only use this for plain old data. If you're building an engine system and want to be able to load it form disc with a memcpy, then you're able to make the choice to design it as POD.

 

I guess I find it hard to imagine a situation where I could do that and still find the code convenient to work with. Some objects are intrinsically hierarchical, and some have varying length contents; trying to flatten all that just seems to move the complexity out of the serialisation and into everywhere else in the engine.

 

 

 

your build system needs to generate binaries for each platform

 

Sure, but it's easier said than done, especially when you have pointers and you're going for a memcpy/fwrite type of approach.

 

Re: the stringoffset stuff - what I understand of it looks much like some of the pointer serialisation stuff I've dealt with in the past, but the code is essentially back into writing out each field one by one, right? Which isn't a bad thing, just that I thought you were trying to avoid that. And the reading code would appear to have to do the same.

Edited by Kylotan

Share this post


Link to post
Share on other sites

I guess I find it hard to imagine a situation where I could [use POD] and still find the code convenient to work with. Some objects are intrinsically hierarchical, and some have varying length contents; trying to flatten all that just seems to move the complexity out of the serialisation and into everywhere else in the engine.
Sure, but it's easier said than done, especially when you have pointers and you're going for a memcpy/fwrite type of approach.

In the example I gave, I deliberately used hierarchy and variable-length data :)
The MyBlob struct is of varying size - List<Foo> is a u32 count, followed by that many Foo instances. Foo contains an Offset<Bar> (which is a u32 that acts like a pointer to a Bar) - a has a/owns a hierarchical relationship.
In my example, all the complexity is in my manually written C# serialization routine - the C++ data structures themselves and the algorithms that operate on them are extremely clean.
 
Sure, you can't put a mutable, complex, under-unspecified std::map into one of these structures, but we can put our own immutable map class into one just fine.

Re: the stringoffset stuff - what I understand of it looks much like some of the pointer serialisation stuff I've dealt with in the past, but the code is essentially back into writing out each field one by one, right? Which isn't a bad thing, just that I thought you were trying to avoid that. And the reading code would appear to have to do the same.

There is no reading code -- you declare those structures and then you use them without a deserialization step. If they're in memory, they're usable - just cast the pointer to their data to the right type and you're done. You can memory map an asset file and start using them immediately, without even having loaded them into RAM first - the OS would page fault and load them on demand in that case!
 [edit] See Offset, List, Array, StringOffset, Address in this file.

Yes, the serialization code in my C# tool writes out each field one by one, though that's because I don't have a serialization framework - it's just a plain old binary writer. I planned on writing one that would automatically go from C# structs to bytes in a files, just how the C++ "reading" side doesn't need to implement any deserialization code. This would mean that all I would have to do is declare the same structure layout in my C++ engine and my C# tools and that would be that.... but I actually found that I like using the plain old binary writer so far so it hasn't been a priority :D
i.e. If you're not a fan of my messy C# serialization code, it is a solvable problem.
 
The C# code that I posted will work across all our platforms, and does have "pointers" embedded in it (as offsets that are never deserialized). It is easy and done! We don't just do this to get better loading times, but also because it actually is good for our productivity.

Edited by Hodgman

Share this post


Link to post
Share on other sites

There is no reading code -- you declare those structures and then you use them without a deserialization step.

 

Right, I guess I just can't mentally work out how that operates given the code that you show, because (for example) the writing code has appeared to add a stringtable at the end that didn't exist prior to serialisation. No doubt there's something tied up in the StringOffset and Offset types that does the magic!

 

Some code I've worked with was exactly like that - per-field serialization, load-blob-and-cast deserialization. And I think that was the approach that caused me the most problems, mostly because there was no real attempt at making the types work well with it, just a mixture of POD types (which work fine, when you stick to explicitly-sized types), structs (which align differently on different platforms, breaking stuff), and pointers (which change size based on architecture, breaking stuff, and requiring a fix-up stage, and some sort of composition vs. association strategy). I expect that can be mitigated a lot if the pointers are all replaced with smart pointers (and ideally ones that understand association vs composition).

 

Personally I wouldn't really want to change a lot of the types involved in the data just to facilitate quicker serialisation; but I can see that it's definitely something people can make a case for (and especially for the GPU data like you said).

Share this post


Link to post
Share on other sites
So there are two basic types of deserialization. One is the hard case that you're describing: where the final in-memory representation of data is complex, richly-typed, and dynamic.

That is a separate problem.

The other type of deserialization is for largely-static data (in terms of layout if not actual contents) that can be baked into a known layout at build time (or serialization time, more accurately). That's the sort I referred to and I believe Hodgman is talking about as well.

For dynamic or richly-typed data, your best bet is to hoist a statically laid-out copy of the data into memory and then post-process it into the runtime format. This is how we save character and account state, for example.


What's that old saying about adding layers of indirection? ;-)

Share this post


Link to post
Share on other sites

You say this is 2 separate problems, but I don't see it that way, because even "largely-static data" can have a "complex, richly-typed, and dynamic [..] in-memory representation" (forgive the quote-juggling). If I have a 3D model, with vertices/indices/materials/shaders/textures/animations, some of which are obviously composed of the others, others merely associating with the others, that's all static data (as far as running the game is concerned), but is going to have a relatively complex in-memory representation. No?

 

I think what I've found interesting is that some developers have exactly that - fairly complex objects in memory - that they're reading in via fread and some fancy pointer twisting. I guess that is probably facilitated by something in the build process ensuring these objects are all contiguous - or just writing them out in such a way and going back to patch up pointers later. Since I've never worked on the build pipeline I've not been exposed to these methods before; it's only after hitting a couple of deserialisation bugs that I realised this sort of thing was common.

 

Let me steer this towards a practical question then; for those using these low-overhead methods, what are you doing to ensure that the layout of data is exactly what you expect - endianness, platform dependent types, 32bit vs 64bit, alignment (inc. SIMD requirements), padding, etc?

Share this post


Link to post
Share on other sites

Right, I guess I just can't mentally work out how that operates given the code that you show, because (for example) the writing code has appeared to add a stringtable at the end that didn't exist prior to serialisation. No doubt there's something tied up in the StringOffset and Offset types that does the magic!

Yes when serializing, I had a C# string object that basically had to become a char* into the Foo structure (or my equivalent: StringOffset). The runtime Foo structure has a StringOffset, which is basically a u32 -- with 0 being "null" and any other value being "there are some char's at (((char*)this) + *this)". The serialization code initially writes a zero for this field into the stream, but retains the stream cursor inside the StringTable strings helper object. 
After writing out all the structures, strings.WriteChars(w); writes the actual char contents of the C# string objects to the stream, at patches up all the offsets (which were written as placeholder 0's earlier) to point to the start of each string's char data. The runtime isn't aware of the existence of a string table, it just knows that it has an offset to some chars, and the tool has made sure that when the runtime follows this offset it will find some zero-terminated chars.

Likewise, an array of Foo structures is written out initially, each of which must contain a link to a Bar structure. The Bar structures can't be written out on demand, as we're in the process of writing a contiguous Foo array. So, the Foo::bar member is written out with a placeholder value of zero, but the stream cursor is retained in tocBars (with tocBars = w.WriteTemp32();). After writing the contiguous Foo array, all the Bar structures are written out, and the Foo::bar members are patched up with the appropriate offsets.
The string table could be written before the Bar structures, making the two arrays non-contiguous with each other (i.e. strings.WriteChars(w); could be moved in between the two for loops) and the runtime would continue working unaffected, since as long as the offsets are correct, the runtime will interpret the data correctly.
This code is basically acting as a hand-written malloc, where the order that they go into the stream is deciding their memory address at runtime... which can be a good thing, as the serialization code can be written with knowledge of the data access patterns, e.g. tweaked to place "hot" data close together.
 

You say this is 2 separate problems, but I don't see it that way, because even "largely-static data" can have a "complex, richly-typed, and dynamic [..] in-memory representation" (forgive the quote-juggling). If I have a 3D model, with vertices/indices/materials/shaders/textures/animations, some of which are obviously composed of the others, others merely associating with the others, that's all static data (as far as running the game is concerned), but is going to have a relatively complex in-memory representation. No?

I believe when ApochPiQ says "richly typed and dynamic", he means, regular, mutable C++ objects full of pointers and vtables and moving parts and things that grow and shrink and change. This stuff obviously isn't applicable to a static data format, as typically the whole file is loaded with one (or a handful of) malloc's, so moving parts aren't very feasible. That said, I know some middleware that contains a nice big blob of 0x00000000 values in their files (which compress to nothing), so that after your engine loads their data file, they've already got a memory area that they can give to their internal allocator to placement-new complex C++ objects within.

As to your example though: an asset that contains vertices/indices/materials/shaders/textures/animations is static, and though it can have a complex in-memory representation, it is simple to handle such complex formats using this low-level serialization technique. That kind of asset data is the perfect candidate for this kind of system.
 

Let me steer this towards a practical question then; for those using these low-overhead methods, what are you doing to ensure that the layout of data is exactly what you expect - endianness, platform dependent types, 32bit vs 64bit, alignment (inc. SIMD requirements), padding, etc?

There's a whole load of approaches.

  • Do it manually, like I posted earlier. This is actually way simpler than it sounds. For each field in a struct, you have a matching Write call. You know the padding/alignment rules and explicitly account for them (Write the invisible padding fields when required). When you call something like Write32 (which matches u32 member;) it can internally endian swap if required (hint: it's not, the PPC consoles are dead :) ). Same for pointers, it can write 32 or 64 bits depending on the platform. If you're writing an asset that will be used on Win32 and Win64 (hint: you're not, nobody supports Win32 any more), then you can just write 8 byte pointers and use PadTo64<T*>::Type m_ptr; instead of T* m_ptr; in your structures and suffer the waste in your Win32 build. In my case though, I prefer to not serialize pointers if possible, and leave them as offsets. Include static_assert(sizeof(T)==42) in the runtime and assert(writer.cursor == cursor_at_start_of_struct + 42) in the tools to act as reminders to fix both code-bases when the data format changes.
  • Use the metadata from your compiler. If you're compiling your code with clang, then also use clang to parse your code and generate a description of your structures automatically. Write a serialization library that can automagically write data into the memory format that it's discovered by parsing your code.
  • Use the metadata from your binding system. If you've got your structures bound to a scripting language like Lua, etc, then you've probably already annotated your structures somehow, as a DIY reflection system for C++. Give that binding data to a serialization library that can automagically write data into the right memory format.
  • Don't manually write the C++ structures at all. Author your structures in a different data definition language / schema, and have it generate the C++ headers that will be used by the runtime, and also generate the serialization code for your tools.
Edited by Hodgman

Share this post


Link to post
Share on other sites

You know the padding/alignment rules and explicitly account for them (Write the invisible padding fields when required)

 

This requires a bit more insight from all developers than I have seen demonstrated in practice. :) I've seen where people added just enough padding to shut up the asserts, without being aware that the location of the padding was important. Their new feature still worked, as did any feature relying on data earlier in the struct, just not features relying on data that came after it. (It was remembering exactly the sort of problem that inspired me to start this thread.)

 

Obviously there's no cure for coders that don't stop to wonder why these asserts exist at all, but it is still pretty brittle.

 

Author your structures in a different data definition language / schema

 

I guess that brings us back to Flatbuffers. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this