Serialization and pointers

Started by
6 comments, last by wcassella 8 years, 1 month ago

For a while now I've had a basic serialization system implemented in my game, and while it works really well for relatively trivial types (arrays, transforms, etc) it's been pretty much useless for doing anything real because it has no way of handling pointers.

If I could make the assertion that all pointed-to objects were allocated identically (via 'new', for example), this wouldn't be an issue because the first time you hit a pointer (owning or not) to anything in the archive, you just new that shit up and deserialize it. Unfortunately, that simply isn't the case.

After a bit of brainstorming, I've come to the conclusion that to properly deserialize pointers you need to be able to do two passes. The first would instantiate all objects and set their trivial values, the second would set the value of non-owning pointers (now that everything has been instantiated), using a mapping of archive IDs to their new addresses that was created in the first pass.

At the moment my serialization system is split into two different functions: 'ToArchive' and 'FromArchive', which are both called recursively. Since 'ToArchive' is pretty much fine as it is, I've come up with two solutions for adding a second pass to the 'FromArchive' function.

A) I would split that function into 'PreFromArchive' and 'PostFromArchive' (the absence of either would indicate that object is simply not touched during that phase, the absence of both would indicate the object cannot be deserialized). They would each be called recursively (starting with the root object), and pointers would be set in the second function. This is nice because it's very simple, but could be inefficient because you're visiting almost everything twice. It's also nice because you can do anything else that might need to happen after you've deserialized everything, including validation for example.

B) Raw pointers (and anything else that needs to be touched up at the end) could add themselves to a queue that gets iterated at the very end. This may be more performant, but could also be error-prone since it creates the possibility of dangling pointers (imagine an object in an array deferring itself, and then the array being reallocated). It could also hypothetically create chicken-and-egg scenarios, where an object deferred itself to wait on the instantiation of another object, even though it in fact was responsible for instantiating that object.

Neither seems terribly difficult to implement at the end of the day, but I'd like some input before I dive into this. Are there any other solutions I'm not considering?

Thanks

Advertisement
How much searching have you done on the topic?

There are many techniques out there that can do it in a single pass, developed primarily to avoid serialising data cycles.


There are many techniques out there that can do it in a single pass

I've looked at some solutions for that, but they all appear to make the same assumption that referenced objects can be recreated via 'new' or some other homogeneous technique. That's fine if that's how the objects being serialized were created, but if your allocation strategy is more complex than that you need more contextual information to reconstruct the referenced object that most pointers simply won't have (and why should they?). In that case, only the owner of the referenced object can determine the proper mechanism for restoring it, so any other references pulled out of the archive will have to wait until the owner has done that before they can have their value set, hence the second pass.

Are there any particular implementations you're thinking of that I could look at?

I recommend Boost::Serialization. It has out-of-the-box support for serializing/deserializing standard containers like vector and array.

Back when I used to deal with this issue, I had everything transforming pointers to pool-slot pairs. The observations were:

  • World data is not generically serialized. It is very well understood by the world structure or by an even higher level manager, therefore I can resolve all possible uses in advance. This mostly applies to native.
  • Scripted data is not a problem: save/deserialize the whole execution environment.

In general, I found that generic serialization frameworks are just overkill. I'm currently playing with google protobuffers and I like the simplicity.

(De)serializing generic structures... way too many issues and every time I hear reflection I think something has gone wrong at a certain point in time. You only need to serialize what you need to have working.

In particular, you don't serialize raw pointers, especially without figuring out the ownership semantics but if it's not owned then it's owned by someone else... and thus you can reduce it to a resource index.

Previously "Krohm"

in general terms, this is loading a relational database.

there are (at least) two possible methods:

the traditional algo is two pass: pass 1: allocate. pass 2: fixup pointers. needless to say, you have to save off what was pointing to what (somehow) so you can fixup the pointers at load time.

the other option is a core dump. no fixup required, because everything is loaded back where it was, so all pointers are still valid. easier to implement using ID numbers than actual memory address pointers. memory address pointers are not trivially relocatable. index values are.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

What are you referencing with the pointers? I'm wondering if the pointers couldn't be refactored out with an architecture change. If the referenced objects are in a pool of some kind, could you just store indices or handles to the objects instead of using raw pointers, as MaxDZ8 touches on?

Similar to how other things have gone, I've decided to give more responsibility to the user of the system, rather than the system itself. In this case that means putting the person writing their serialization function in charge of instantiating referenced objects before references to those objects are deserialized. This makes things much more simple, because rather than doing multiple passes over the objects being deserialized (a potentially complex and unknown structure), you're doing multiple passes over the serialization archive (a much simpler and well-understood structure).

This does lead to somewhat non-intuitive code, but I've gotten it up and running now in some simple tests. Here are the serialization and deserialization functions for my World structure, if anyone's curious (with a bit removed, for the sake of simplicity):


void World::ToArchive(ArchiveWriter& writer) const
{
    writer.SetID(this);
    writer.PushValue("TimeDilation", this->TimeDilation);
    writer.PushValue("TimeStep", this->TimeStep);
        
    // Save all entites/components
    writer.AddChild("GameObjects", [this](auto& child)
    {
        for (const auto& kv : this->_gameObjects)
        {
            // Push the given value into this archive, wrapping it in a Node containing its type name and address
            child.PushValueWithID(kv.Second->GetType().GetName(), *kv.Second);
        }
    });
}

void World::FromArchive(const ArchiveReader& reader)
{
    reader.MapID(this); // NOTE: Technically not a 'const' operation, still getting the API finalized
    reader.PullValue("TimeDilation", this->TimeDilation);
    reader.PullValue("TimeStep", this->TimeStep);
        
    // Load all entities/components
    reader.GetChild("GameObjects", [&](const auto& child)
    {
        Queue<Owned<GameObject>> unloadedObjects;

        // Do a first pass, instantiate everything
        child.EnumerateChildren([&](const auto& gameobject)
        {
            // Find the type of GameObject that this node is referring to
            auto type = Application::FindType(gameobject.GetName());
                
            if (!type || !type->IsCastableTo(TypeOf<GameObject>()))
            {
                // Type isn't a GameObject type, go to next object
                // But push in a null pointer so that the Queue has as many elements as this archive has nodes
                gameobject.MapID(nullptr);
                unloadedObjects.Push(nullptr);
            }
            else
            {
                // Instantiate it
                auto object = StaticPointerCast<GameObject>(DynamicNew(*type));

                // Map the address of the new instantiation to the address in the archive
                gameobject.MapID(object.GetManagedPointer());
                unloadedObjects.Push(std::move(object));
            }
        });

        // Do a second pass, deserialize everything
        child.EnumerateChildren([&](const auto& gameobject)
        {
            if (auto object = unloadedObjects.Pop())
            {
                // Deserialize it
                object->FromArchive(gameobject);

                // Add it to the scene
                _gameObjects[object->GetID()] = std::move(object);
            }
        });
    });
}

This topic is closed to new replies.

Advertisement