An academic question: serialization aesthetics.

Started by
3 comments, last by d_emmanuel 18 years, 4 months ago
Hey folks. I'm working on an academic "perfect solution" to serialization, and I don't like what I"ve come up w/ so far. I'm looking for a good way to read an object w/ no default constructor from a stream w/o having that object depend directly on streams. Background: Lets say I have a class who's invariant is to never have an invalid or null state - meaning that it has no default constructors and no reset methods. It takes all of its needed data in a constructor - if a client needs to change the data, it has to create a new object. This isn't a problem because clients of this class store it in a std::auto_ptr, so to modify their objects, they just reset the auto_ptr w/ a new constructor call. I have a group of objects that all follow this paradigm. Now, however, I want to add a serialization layer to them. Serializing to the stream works out fine. The problem is that this paradigm doesn't lend well to reading from a serialized stream. Consider the following:
class CObj
{
public:
    struct SRep
    {
        ...
    };

    explicit CObj(const SRep& arRep): mRep(aRep) {}
    CObj(const CObj& arSrc): mRep(arSrc.mRep) {}

    void Swap(CObj& ar) { std::swap(mRep, ar.mRep); }

    const SRep& GetRep() { return mRep; }

private:
    SRep mRep;
};


CSerializer& operator <<(CSerializer& arDst, const CObj::SRep& arSrc);
CSerializer& operator >>(CSerializer& arSrc, CObj::SRep& arDst);

CSerializer& operator <<(CSerializer& arDst, const CObj& arSrc);
This is the majority of the nuts and bolts needed to serialize and deserialize the needed data for CObj. The problem arises as what mechanism to use to actually read from the stream? The standard >> won't work, because it implies that you have an already-created CObj to refer to, but that's not possible. Chicken and the egg. The first (hacky) solution that comes to mind would be something like:
CSerializer& operator >>(CSerializer& arSrc, CObj*& arpDst);
Which would create a new CObj* in the function and return it through the parameter. The problem is that it differs from the rest of the serialization paradigm that takes just a reference, not a reference to a pointer. Aesthetics say this is bad, especially because now there's a factory method hidden in a random operator >>. The more correct option would be to have a factory of some sort that takes a CSerializer& and spits out a CObj*. Factory methods like these:

class CObj
{
public:
    CObj(CSerializer& arSrc);
    // or maybe even
    static CObj& FromStream(CSerializer& arSrc);
};
won't work because these are invasive and force a direct dependency on CSerializer into CObj, which is also bad. At the moment I'm favoring having a global
template <typename tCObj> Create(CSerializer&);
with a particular specialization for CObj, but even still, this does not sound optimal. What would you folks do here?
-- Succinct(Don't listen to me)
Advertisement
Before going any further with an "academic solution to serialization for C++" i suggest investigating boost serialization and Boost.Iostreams.
Yes I think the boost serialization library is what you need, it is very complete, stable and fast.

There are other elegant solutions to serialization.
Unfortunately I can't divulge too much about the underlying implementation of my serialization library, but I can say that I took a long hard look at the boost libray (and the others they suggested in their bibliography), scrapped the lot and used what I had learnt to build my own unique solution.

The boost serialization lib tends to favour inheritance (using the curiously recurring template pattern), this avoids run-time overhead which is important IMO. My attempt uses a heirachy about half as deep as theirs since I favoured aggregation mixin.

Basically I have a separation of the concepets of archive, serializer and stream. My library allows massive flexibility, typically it uses C++ iostreams but thanks to templates theres no requiremet for them.
I can use the library for file serialization, networking, console IO, XML archiving, all through the same simple interface, and some nice/funny things too like sending endian independant binary data in XML format over a network where it would be recieved and stored in a human-readable ascii text file (but why i would want to is anyones guess).

Good luck

[Edited by - dmatter on December 3, 2005 1:20:37 PM]
This is why extraction operators are a bad model for deserialization: because deserialization is a creation operation, not a mutation operation. Unfortunately, everyone seems to be on the iostreams-esque bandwagon, and C++ has ugly creation semantics anyway, so you're unlikely to get an ideal solution here. Have a named constructor which returns a null-state object, and which is only used in deserialzation. You can add an "uninitialized" flag in debug mode to make this slightly more robust.
The simplest and most robust way of handling this in C++ is to define a constructor that constructs the object from the stream like so:

explicit CObj(CSerializer& stream);

You can minimize dependencies using forward declarations. This way you'll only need to include the header that contanins CSerializer in your implementation file for CObj.

This topic is closed to new replies.

Advertisement