Archiving C++ objects to JSON
Last time I talked about serialization of C++ objects to and from an abstract archive format, using a templated interface similar to Boost Serialization. This time I want to get a little more specific and explain how objects can be converted to JSON (http://www.json.org/).
It's quite instructive to look at JSON as a format because it maps very well the the way that objects are represented in computer code. Other formats such as XML can do the same job, but in my opinion not in such an elegant and minimal way.
JSON has objects (made of key/value pairs), and a few basic types. That's it. Can that possibly be enough? Actually, yes. C++ code has many more ways of representing data, but in terms of actual content you don't need more than JSON provides.
To write and parse JSON I wrote my own code, which was not very difficult. But there are many libraries available, such as SimpleJSON. I'll link all my code, including the serialization, archive and parser in a future update.
JSON has numbers, strings, a boolean type and a null value. These map to C++ types quite easily. All numerical values - integers, floats and doubles become numbers. I treat std::string as a basic type and bool is written as true or false.
Objects and Classes
JSON doesn't use classes; its objects are self-describing. When writing a C++ object as a JSON object, the keys come from the member names in the class, and the values come from the data in the object. This allows objects to change over time by adding or removing keys. An extra member in the class could be loaded with a default value, or a missing member ignored. What this doesn't allow for is a change of structure, which would have to be dealt with by creating a new object, but for the most part versioning is not necessary with JSON.
Arrays and Containers
Every C++ container is a sequence of values, so I simply map every C++ container type to a JSON array, whether a vector, list or anything else. I use templated functions to read and write the different container types from the standard library because I prefer to use those to raw arrays.
As part of keeping the format readable, I don't want to be writing enums as numerical values. Besides, the value might change in the code and that would make the data invalid (which is more likely than the name changing). So enums are written out as strings, and converted back from a table when loaded. I use some macros to make this neater.
JSON does not support inheritance. This doesn't matter. In terms of data, a base class is just like another member. I write it out using the name of the base class as the key.
Pointers and Polymorphism
This is the point where something is really missing, because JSON does not have pointer or reference types. And there's no way I can get away without it because I use polymorphic types in my data, and I need to be able to archive them. In fact, I only use smart pointers so I'm not concerned about how to represent references generically in JSON, but only how to represent a specific type of data object, which is something it is well able to cope with.
To begin with I need a way to uniquely identify a type, and I get this from a string identifier which is unique to the class. I add a virtual function (using a macro) to each type that I want to serialize polymorphically. The identifier also has to be registered with the serialization system so that a handler object can be created. The handler is going to instantiate the template functions to read and write the object. (That actually amounts to two lines of extra code per polymorphic type, so nothing major in terms of interface).
These polymorphic types can only be written out using smart pointers, which have a templated serialization function. What that does is write out a JSON object containing a unique identifier for the object, (which is the memory address) and the unique identifier for the type (a string) followed by the object itself (but only once per archive in the case of a shared pointer). When reading the object back I know first of all whether it needs to be created (because there may have been a previous reference to it), then I can look up the handler from the type, and then create and read the object. There's a fair amount of factories and templates behind the scenes to make this work.
Does data need to begin its existence in C++? What about creating types outside of the source code, building objects using those definitions, and still being able to access those inside the program. That will be my subject next time.