Data exchange formats!
Importing and exporting data to and from your game, be it save data, resource data or anything in between can be tricky to get right in a flexible manner.
While developing your game you might want to be able to make sense of the data your game is working with, so you'll want to store data in a compact, easy to parse and human readable format. However when you release your game you might also want to be able to store this exact same data in a binary format without breaking compatibility. On top of that you might want to store your data in such a way that the overhead of building your in-game data structures from these files becomes as small as possible, with a 1-to-1 mapping of data being the ideal case.
I've been working on solving these problems in my own implementations for a while, and while I haven't found the "perfect solution" just yet, I've come across some interesting techniques for working with data. In this journal entry I'd like to share an overview of some work I've done the last couple of months, primarily focusing on human readable representation of in-game data.
Before I begin I'd like to share an article which was posted last month on #AltDevBlogADay (and reposted on gamasutra) about 'A formal language for data definitions'. I've drawn some inspiration from this article for developing my solutions, so it might be an interesting read.
1. The first attempt: XML and the 'Generic attribute system':
As most of you will probably know XML (eXtensible Markup Language) is a simple and popular language for storing data in a both human and machine readable format. Because of its popularity and widespread use there are a lot of third party libraries available for reading and writing XML data in a lot of major programming languages. Because of this it might not be a surprise that my journey started off in the realm of XML.
While it's technically possible to store pretty much any data representable by text in XML, the language itself has no concept of primitive data types. To give an example about what kind of issues this can present, when writing numerical data in XML it will be up to your program to decide whether this data is actual numerical data, or a string representing numerical data. This can be resolved however by providing a so-called schema for the data you're trying to represent, and by using a parser which can validate your XML document against this schema.
Using a schema however can present some overhead, both for the actual parsing of data - you're actually parsing 2 files now- and for overall data maintenance. Seeing this added overhead, I decided not to go with schema's and went for a more "brute force" approach: The generic attribute system.
The attribute system in itself was really simple, a single attribute contained 2 string values: a name and a value. Attributes are stored in so-called attribute sets, which can contain other attribute sets as subsets as well. This creates a very primitive data structure for storing data which can be represented by text hierarchically, so mapping XML data to this intermediate data format was very simple.
To solve the problem of determining which datatype an attribute contained I went with a very primitive approach: Let some factory system deal with it. This meant that an object factory would first of all check whether all attributes for creating an object were available in the attribute set, after which it attempted to parse the string value an attribute contained as the expected datatype. If the data was parsed successfully the factory could do a constraint check (eg. checking whether a value was within acceptable ranges) and construct the requested object.
This worked, that it did, but I don't think I have to explain to anyone why this wasn't exactly an ideal system (brute force approaches seldomly are). The parsing stage for getting data from attribute sets to actual objects pretty much forced me to provide a completely different code path for parsing binary files, which is something we really wanted to avoid.
So XML and attributes went into the trashcan, and I set some prerequisites for my next approach:
- The language for defining data should support some basic primitive types
- This language should also allow for a direct mapping of most types defined by the game/engine.
- And this language should allow a user to structure data in such a manner that it almost directly maps to a binary representation of the same data, while still remaining readable.
2. The second attempt: JSON... or something that used to be JSON
I always liked JSON (JavaScript Object Notation), I always thought of it as a clean and no-bullshit way of storing data. As opposed to XML, JSON does have support for a couple of primitive types, these being strings, numerical values, boolean values, and null values. JSON also provides the concept of objects -which are regular ol' associative arrays- and lists. On top of that JSON syntax is ridiculously easy to parse.
I don't like everything about JSON though. The lack of a syntax for writing comments is what bothered me most as I like to write and document some files by hand, although I understand the decision not to include it into the language itself. Some developers write comments as elements in an object, but this means these values will be parsed and loaded in as actual data, and that's something I want to avoid.
As I mentioned above JSON has a really easy syntax, so I decided to experiment with writing my own JSON parser just for the fun of it. I didn't have any previous experience with writing parsers, except for systems for reading binary data (which don't really qualify as parsers), but after an hour or two I had a complete JSON parser built from the ground up. After throwing a bunch of huge JSON files at it to see if it actually worked as intended (it did), I started to experiment with it.
As I said I had no previous experience with writing parsers, so I haven't a clue about best practices or about how to approach complex languages, so I don't know whether the approach I followed for my parser would make any sense to someone who has more experience in these matters. What I did was create a parser system which would accept 'rule objects'. Each rule object would describe the syntax for a single primitive datatype or data structure, and a system for parsing that datatype or structure, optionally mapping it to a native (in my case C++) representation of that type or structure.
This means that the parser just remembers where it is in a file, checks whether it can find a rule which can be applied at that position, and executes the parser for that rule.
So my original JSON parser contained rules for objects, lists, strings, numerical values, booleans and null values. Of course, the first thing I thought was, why stop here? I also realized that a rule didn't necessarily have to map to an internal data value, so I could just write additional rules for adding adding language features, like comments.
So, I wrote a very simple and small rule for C-style line and block comments and registered it with the parser. This worked perfectly, which meant I now had a language incompatible with regular JSON, but which supported all the features of regular JSON with the added benefit of comments.
Of course, additional rules followed, adding even more supported datatypes. Some examples include support for data structures like vectors, matrices, etc. Support for things like directly assigning binary data (found in external files) to object or list entries was added as well, together with more game-specific functionality such as resource references.
The result looks something like this:
[source lang="jscript"]/*
* This structure describes a material
*/
{
// Global material info
"name": "some_material",
"shader_program": @resource("deferred.rsh"),
// Material parameters
"parameters":
{
"Color": @color( 0.0, 1.0, 1.0, 1.0 )
},
// Texture resources
"textures":
{
"Diffuse": @resource("diffuse.rtex")
}
}[/source]
[source lang="jscript"]/*
* This structure describes a shader
*/
{
// Global shader info
"name": "some_shader",
"shader_setups":
[
{
// Standard shader setup info
"name": "default_d3d11",
"layer": "solid",
"platform": "win_d3d11",
"shader_target": 5.0,
"shaders":
[
{
"shader_type": @enum("vertex"),
"shader_source": @file("some_shader_source.hlsl"),
"entry_point": "VS",
"flags": [ "DEBUG" ]
},
{
"shader_type": @enum("pixel"),
"shader_source": @file("some_other_shader_source.hlsl"),
"entry_point": "PS",
"flags": [ "DEBUG" ]
}
],
"samplers":
[
{
"name": "some_sampler",
"filter": @enum("anisotropic"),
"address_u": @enum("wrap"),
"address_v": @enum("wrap")
}
]
}
]
}[/source]
(note: These are just dummy structures written for example purposes.)
So now we have an extensible language which is easy to read, easy to parse, and which can be directly parsed into a binary representation from which we can construct objects in our game, just like when we would load in binary files.
This is a massive improvement over our XML-based approach, but there's still work to be done.
That, however, will be for another entry.