Jump to content

  • Log In with Google      Sign In   
  • Create Account





Data exchange formats!

Posted by Radikalizm, 11 December 2012 · 1,406 views

Data exchangeXML JSON extensible serialization Data definition
Data exchange formats!

Importing and exporting data to and from your game, be it save data, resource data or anything in between can be tricky to get right in a flexible manner.

While developing your game you might want to be able to make sense of the data your game is working with, so you'll want to store data in a compact, easy to parse and human readable format. However when you release your game you might also want to be able to store this exact same data in a binary format without breaking compatibility. On top of that you might want to store your data in such a way that the overhead of building your in-game data structures from these files becomes as small as possible, with a 1-to-1 mapping of data being the ideal case.

I've been working on solving these problems in my own implementations for a while, and while I haven't found the "perfect solution" just yet, I've come across some interesting techniques for working with data. In this journal entry I'd like to share an overview of some work I've done the last couple of months, primarily focusing on human readable representation of in-game data.


Before I begin I'd like to share an article which was posted last month on #AltDevBlogADay (and reposted on gamasutra) about 'A formal language for data definitions'. I've drawn some inspiration from this article for developing my solutions, so it might be an interesting read.



1. The first attempt: XML and the 'Generic attribute system':

As most of you will probably know XML (eXtensible Markup Language) is a simple and popular language for storing data in a both human and machine readable format. Because of its popularity and widespread use there are a lot of third party libraries available for reading and writing XML data in a lot of major programming languages. Because of this it might not be a surprise that my journey started off in the realm of XML.

While it's technically possible to store pretty much any data representable by text in XML, the language itself has no concept of primitive data types. To give an example about what kind of issues this can present, when writing numerical data in XML it will be up to your program to decide whether this data is actual numerical data, or a string representing numerical data. This can be resolved however by providing a so-called schema for the data you're trying to represent, and by using a parser which can validate your XML document against this schema.

Using a schema however can present some overhead, both for the actual parsing of data - you're actually parsing 2 files now- and for overall data maintenance. Seeing this added overhead, I decided not to go with schema's and went for a more "brute force" approach: The generic attribute system.

The attribute system in itself was really simple, a single attribute contained 2 string values: a name and a value. Attributes are stored in so-called attribute sets, which can contain other attribute sets as subsets as well. This creates a very primitive data structure for storing data which can be represented by text hierarchically, so mapping XML data to this intermediate data format was very simple.

To solve the problem of determining which datatype an attribute contained I went with a very primitive approach: Let some factory system deal with it. This meant that an object factory would first of all check whether all attributes for creating an object were available in the attribute set, after which it attempted to parse the string value an attribute contained as the expected datatype. If the data was parsed successfully the factory could do a constraint check (eg. checking whether a value was within acceptable ranges) and construct the requested object.

This worked, that it did, but I don't think I have to explain to anyone why this wasn't exactly an ideal system (brute force approaches seldomly are). The parsing stage for getting data from attribute sets to actual objects pretty much forced me to provide a completely different code path for parsing binary files, which is something we really wanted to avoid.

So XML and attributes went into the trashcan, and I set some prerequisites for my next approach:
  • The language for defining data should support some basic primitive types
  • This language should also allow for a direct mapping of most types defined by the game/engine.
  • And this language should allow a user to structure data in such a manner that it almost directly maps to a binary representation of the same data, while still remaining readable.

2. The second attempt: JSON... or something that used to be JSON

I always liked JSON (JavaScript Object Notation), I always thought of it as a clean and no-bullshit way of storing data. As opposed to XML, JSON does have support for a couple of primitive types, these being strings, numerical values, boolean values, and null values. JSON also provides the concept of objects -which are regular ol' associative arrays- and lists. On top of that JSON syntax is ridiculously easy to parse.

I don't like everything about JSON though. The lack of a syntax for writing comments is what bothered me most as I like to write and document some files by hand, although I understand the decision not to include it into the language itself. Some developers write comments as elements in an object, but this means these values will be parsed and loaded in as actual data, and that's something I want to avoid.

As I mentioned above JSON has a really easy syntax, so I decided to experiment with writing my own JSON parser just for the fun of it. I didn't have any previous experience with writing parsers, except for systems for reading binary data (which don't really qualify as parsers), but after an hour or two I had a complete JSON parser built from the ground up. After throwing a bunch of huge JSON files at it to see if it actually worked as intended (it did), I started to experiment with it.

As I said I had no previous experience with writing parsers, so I haven't a clue about best practices or about how to approach complex languages, so I don't know whether the approach I followed for my parser would make any sense to someone who has more experience in these matters. What I did was create a parser system which would accept 'rule objects'. Each rule object would describe the syntax for a single primitive datatype or data structure, and a system for parsing that datatype or structure, optionally mapping it to a native (in my case C++) representation of that type or structure.
This means that the parser just remembers where it is in a file, checks whether it can find a rule which can be applied at that position, and executes the parser for that rule.

So my original JSON parser contained rules for objects, lists, strings, numerical values, booleans and null values. Of course, the first thing I thought was, why stop here? I also realized that a rule didn't necessarily have to map to an internal data value, so I could just write additional rules for adding adding language features, like comments.

So, I wrote a very simple and small rule for C-style line and block comments and registered it with the parser. This worked perfectly, which meant I now had a language incompatible with regular JSON, but which supported all the features of regular JSON with the added benefit of comments.

Of course, additional rules followed, adding even more supported datatypes. Some examples include support for data structures like vectors, matrices, etc. Support for things like directly assigning binary data (found in external files) to object or list entries was added as well, together with more game-specific functionality such as resource references.

The result looks something like this:
[source lang="jscript"]/** This structure describes a material*/{ // Global material info "name": "some_material", "shader_program": @resource("deferred.rsh"), // Material parameters "parameters": { "Color": @color( 0.0, 1.0, 1.0, 1.0 ) }, // Texture resources "textures": { "Diffuse": @resource("diffuse.rtex") }}[/source]
[source lang="jscript"]/** This structure describes a shader*/{ // Global shader info "name": "some_shader", "shader_setups": [ { // Standard shader setup info "name": "default_d3d11", "layer": "solid", "platform": "win_d3d11", "shader_target": 5.0, "shaders": [ { "shader_type": @enum("vertex"), "shader_source": @file("some_shader_source.hlsl"), "entry_point": "VS", "flags": [ "DEBUG" ] }, { "shader_type": @enum("pixel"), "shader_source": @file("some_other_shader_source.hlsl"), "entry_point": "PS", "flags": [ "DEBUG" ] } ], "samplers": [ { "name": "some_sampler", "filter": @enum("anisotropic"), "address_u": @enum("wrap"), "address_v": @enum("wrap") } ] } ]}[/source]
(note: These are just dummy structures written for example purposes.)

So now we have an extensible language which is easy to read, easy to parse, and which can be directly parsed into a binary representation from which we can construct objects in our game, just like when we would load in binary files.
This is a massive improvement over our XML-based approach, but there's still work to be done.



That, however, will be for another entry.




yaml
Yes, YAML was on my list of options as well, it's just that I kind of rolled into this JSON solution by accident and it's working out pretty well.
If you already happen to be using it for scripting, Lua works pretty well as a data-description language. And Lua scripts can be compiled.

If you already happen to be using it for scripting, Lua works pretty well as a data-description language. And Lua scripts can be compiled.


I have a toy-implementation up and running which uses Lua to do some game-side scripting, but haven't really had the need yet to do any extensive work with it. I hadn't considered doing data definition in Lua yet, so thanks for the tip :)
Have a look at Apache Avro (http://avro.apache.org/). it's quite an interesting system.

Have a look at Apache Avro (http://avro.apache.org/). it's quite an interesting system.


Interesting, I honestly had never heard about this before. I skimmed over their documentation, and I see they are pretty much dependent on boost, and that's something I really have been trying to avoid in my projects.

I haven't told the entire story in this journal entry, as I mentioned at the bottom there was still work to be done after the stages I discussed here. The systems I've developed for building binary data out of this custom modular language are actually extremely similar to what this Avro project provides.
Just like Avro, I use JSON schemas to validate contents of my data files, and just like Avro I'm able to build C++ classes from a schema file or parts of such a file if required. Getting binary data out of a text-file becomes as simple as constructing a generic compiler object provided with a schema, and passing a bunch of input files through it. This all happens really fast, even for very complex data structures.


I also did not mention some of my project constraints, mostly to do with memory management. It's not all that simple to use just any 3rd party library in my projects, as I'm using some quite 'exotic' memory management systems. Most notable of these are scope-based linear allocation and pool-based allocation, and I assume you can imagine that it isn't always that easy to let a 3rd party library play nicely with these.
This is basically one of the major reasons why I started to experiment with parsers by myself, since I could have complete control of how this system would work with memory. I know that it's generally frowned upon to do this kind of wheel-reinventing, but seeing the results I'm getting out of it I think it was worth the trouble (even though I don't consider it as 'trouble').

October 2014 »

S M T W T F S
   1234
567891011
12131415161718
19 20 2122232425
262728293031 

Recent Entries

Recent Comments

PARTNERS