Jump to content

  • Log In with Google      Sign In   
  • Create Account

EWClay's Journal

Archiving C++ objects to JSON

Posted by , 06 April 2013 - - - - - - · 1,251 views

Last time I talked about serialization of C++ objects to and from an abstract archive format, using a templated interface similar to Boost Serialization. This time I want to get a little more specific and explain how objects can be converted to JSON (http://www.json.org/).

It's quite instructive to look at JSON as a format because it maps very well the the way that objects are represented in computer code. Other formats such as XML can do the same job, but in my opinion not in such an elegant and minimal way.

JSON has objects (made of key/value pairs), and a few basic types. That's it. Can that possibly be enough? Actually, yes. C++ code has many more ways of representing data, but in terms of actual content you don't need more than JSON provides.

To write and parse JSON I wrote my own code, which was not very difficult. But there are many libraries available, such as SimpleJSON. I'll link all my code, including the serialization, archive and parser in a future update.

Basic Types

JSON has numbers, strings, a boolean type and a null value. These map to C++ types quite easily. All numerical values - integers, floats and doubles become numbers. I treat std::string as a basic type and bool is written as true or false.

Objects and Classes

JSON doesn't use classes; its objects are self-describing. When writing a C++ object as a JSON object, the keys come from the member names in the class, and the values come from the data in the object. This allows objects to change over time by adding or removing keys. An extra member in the class could be loaded with a default value, or a missing member ignored. What this doesn't allow for is a change of structure, which would have to be dealt with by creating a new object, but for the most part versioning is not necessary with JSON.

Arrays and Containers

Every C++ container is a sequence of values, so I simply map every C++ container type to a JSON array, whether a vector, list or anything else. I use templated functions to read and write the different container types from the standard library because I prefer to use those to raw arrays.


As part of keeping the format readable, I don't want to be writing enums as numerical values. Besides, the value might change in the code and that would make the data invalid (which is more likely than the name changing). So enums are written out as strings, and converted back from a table when loaded. I use some macros to make this neater.


JSON does not support inheritance. This doesn't matter. In terms of data, a base class is just like another member. I write it out using the name of the base class as the key.

Pointers and Polymorphism

This is the point where something is really missing, because JSON does not have pointer or reference types. And there's no way I can get away without it because I use polymorphic types in my data, and I need to be able to archive them. In fact, I only use smart pointers so I'm not concerned about how to represent references generically in JSON, but only how to represent a specific type of data object, which is something it is well able to cope with.

To begin with I need a way to uniquely identify a type, and I get this from a string identifier which is unique to the class. I add a virtual function (using a macro) to each type that I want to serialize polymorphically. The identifier also has to be registered with the serialization system so that a handler object can be created. The handler is going to instantiate the template functions to read and write the object. (That actually amounts to two lines of extra code per polymorphic type, so nothing major in terms of interface).

These polymorphic types can only be written out using smart pointers, which have a templated serialization function. What that does is write out a JSON object containing a unique identifier for the object, (which is the memory address) and the unique identifier for the type (a string) followed by the object itself (but only once per archive in the case of a shared pointer). When reading the object back I know first of all whether it needs to be created (because there may have been a previous reference to it), then I can look up the handler from the type, and then create and read the object. There's a fair amount of factories and templates behind the scenes to make this work.

Next steps

Does data need to begin its existence in C++? What about creating types outside of the source code, building objects using those definitions, and still being able to access those inside the program. That will be my subject next time.

Game data and serialization

Posted by , 05 April 2013 - - - - - - · 1,132 views

Getting data into a game is a big subject and I want to talk about some of the systems I'm using in some detail, so it may take a few posts to get through it. I will narrow the subject a little first though: this is not going to be about asset loading - textures, meshes, sounds and so on. Rather it's the data that makes up the game itself, so that would be levels, entities, and all the associated information. Since in the game these are C++ objects, there needs to be a mechanism for interacting with such objects in order to save and load them. So the first step is serialization, which is simply the process of translating an object into a stream of data and vice versa.

Obviously this is a well studied problem, but still a subtle one. C++ does not provide serialization out of the box. One of the best known solutions is the Boost Serialization library (http://www.boost.org/doc/libs/1_53_0/libs/serialization/doc/index.html). It's not entirely suitable for my purposes, for several reasons which I hope will become clear later, but it does offer some very useful concepts.

First among these is the separation of the process of serialization from the archive format. This is one of the fundamental goals of Boost Serialization; whether it achieves it is another matter. It also puts some care into the serialization interface so that it is fairly simple (at least, for simple classes) and economical in terms of the amount of code and maintenance required.

So, points that I like about Boost Serialization (and naturally intend to steal):
  • Templated serialization functions. Virtual base classes for archives is the alternative, but this seems to be the best implementation for flexibility and efficiency.
  • Non-intrusive code, as far as possible. Serialization has a minimal impact on the original class.
  • Automatic serialization of containers and other common types.
And what I don't like:
  • The versioning system. Versioning is explicit in Boost Serialization and I prefer it to be automatic, as in, I can add or remove members without having to write special code to do that.
  • Automatic object and pointer tracking. This complicates the archive format and the serialization interface. In contrast to versioning, I prefer this to be explicit.
  • Some aspects of the interface, including operator overloading.
Taking all that into account, here's a small example of a serialization function using my system:
struct Rect 
	int16 x, y;
	uint16 w, h;

template <typename Archive>
void Serialize(Archive& ar, Rect& t)
	ar.Serialize("X", t.x);
	ar.Serialize("Y", t.y);
	ar.Serialize("W", t.w);
	ar.Serialize("H", t.h);

As you can see it is templated to receive an archive and a serializable type. The function simply runs through the members and calls Serialize on the archive, providing a unique name (within this type) for each member. This function will do for loading and saving. The Serialize function on the archive is a template too and will detect the object type to serialize it correctly (and recursively).

And the output, when I write to a JSON archive:
	"Rect": {
		"X": 32,
		"Y": 0,
		"W": 32,
		"H": 32
Now, that doesn't have to be a JSON archive. I'm using that for now because it's portable, easy to parse and human readable. But I could swich over to a binary format without changing the serialization function at all.

Next time I'll go through some more details of the implementation.

April 2013 »


Recent Entries

Recent Comments