1. Because 99% of the objects in the engines I've worked with are not Plain Old Data
2. Because the data is padded or aligned differently on different platforms
You only use this for plain old data. If you're building an engine system and want to be able to load it form disc with a memcpy, then you're able to make the choice to design it as POD.
The thing about KISS, is that it's hindered by complex engineering, so you end up with a war in your code-base when you try to mix the brutally simple with the deceivingly complex :)
For 2, your build system needs to generate binaries for each platform - meaning your can't load your PS3 assets on a PS4. Your serializer can know some things about the platform it's generating for, such as its endianness and the padding rules of its compiler.
This is why we only use it for the kind of static assets that you ship on a disc, and not more portable/flexible/changing things, such as save-games or user-facing editable content.
I can appreciate the hypothetical speed benefits of this but given how error-prone they are, I wonder whether there is any real benefit.
Well it depends what you're comparing it to. There's a lot of games that peg a CPU core at 100% usage for 30 seconds when loading a level :D
We handle data structures of any complexity, references between different assets (
model loads a material loads a texture, etc) and do next to no CPU work during loading -- it's all just waiting for the CPU to map our data into address-space. For graphics assets, we support loading an asset as several "blobs" which can be allocated in different ways, which allows us to stream vertex/pixel data directly into GPU memory instead of, e.g. loading an image file into memory, deserializing it, then creating the GPU resource, and then copying the data across.
Often we don't even fix up pointers on load. I've a template library containing pointer-as-relative-offset, pointer-as-absolute-offset, fixed array, fixed hash table, fixed string (with optional length prepended to the chars, and the hash appearing next to the offset to the chars), etc. Instead of doing a pass over the data to deserialize these, they're just left as is, and pointers are computed on the fly in operator->, etc. This can be a big win where you know a 2byte offset is enough, but a pointer would be 8bytes.
This sounds fascinating but I have no idea how that would work. Say you have this object:
class Xyz
{
std::string name;
u32 value1;
};
What does the writing code look like? Do you write each field out individually, telling it that you want 'name' to be written as a fixed string? Because I'm encountering serialisation strategies that basically mandate removal of that string from the class, replacement with a string hash or some other POD type, then an fwrite (or equivalent).
(Or, worse, I've seen a system that expected to read objects back in a single chunk, but required each field to be written individually, so that it could mangle pointers on the way out. This kind of gives you the worst of both worlds. But hey, fast loads and you can still use pointers!)
If I'm using this deserialization technique, I'm almost certainly using it for some kind of immutable asset data. It's very rare to have mutable assets. That means that
std::string is overkill. Though it's hard to think of a good use for
std::string in any part of a game engine IMHO :wink:
For an example system, I'd write some C++ code of the data that I need, and the algorithms I need to consume it:
struct Bar
{
u32 value1;
u32 value2;
};
struct Foo
{
StringOffset name;
Offset<Bar> bar;
};
struct MyBlob
{
List<Foo> foos;
};
void Test( const MyBlob& blob )
{
for( uint i=0, end=blob.things.count; ++i )
{
const Foo& foo = blobs.foos[i];
const Bar& bar = *foo.bar;
const char* name = foo.name;
printf( "name %s, %d, %d", name, bar.value1, foo.bar->value2 );
}
}
I could write a better serialization system for the C# tool / generator side, but it's all explicit ATM:
class FooBar
{
public string name;
public int v1, v2;
};
...
List<FooBar> data = ...
using(var chunk0 = new MemoryStream())
using(var w = new BinaryWriter(chunk0))
{
StringTable strings = new StringTable(StringTable.Encoding.Pascal, true);
//struct MyBlob
w.Write32(data.Count);//MyBlob.foos.count
long[] tocBars = new long[d.Count];
for( int i=0, end=data.Count; i!=end; ++i )
{
//struct Foo
strings.WriteOffset(w, data[i].name); // Foo.name
tocBars[i] = w.WriteTemp32(); // Foo.bar
}
for( int i=0, end=d.Count; i!=end; ++i )
{
w.OverwriteTemp32(tocBars[i], w.RelativeOffset(tocBars[i])); // Fix up Foo.bar
//struct Bar
w.Write32(data.Count);// Bar.value1
w.Write32(data.Count);// Bar.value2
}
strings.WriteChars(w);
}