Jump to content

  • Log In with Google      Sign In   
  • Create Account


Like
12Likes
Dislike

YAML Basics and Parsing with yaml-cpp

By Dejaime Antônio de Oliveira Neto | Published Mar 01 2014 06:15 AM in APIs and Tools
Peer Reviewed by (jbadams, Alpha_ProgDes, jjd)

yaml yaml-cpp encoded human readable format serialization c++

YAML - YAML Ain't a Markup Language


YAML is a general-use data serialization language. Its most common uses are storing configurations, data persistence and online software messaging. As its name states, YAML is not a markup language, and this allows for more readable code.

Example XML code:

<settings>
    <graphics>
        <vsync>true</vsync>
        <quality>ultrahigh</quality>
        <resolution>
            <width>1920</width>
            <height>1080</height>
        </resolution>
    </graphics>
    
    <gameplay>
        <difficulty>hard</difficulty>
        <invert_y>false</invert_y>
    </gameplay>
</settings>

Equivalent YAML code:

settings:
    graphics:
        vsync: true
        quality: ultrahigh
        resolution:
            width: 1920
            height: 1080
    
    gameplay:
        difficulty: hard
        invert_y: false

Very Brief Introduction to YAML


This is a short, yet objective, introduction. By reading this you should be able to create and edit simple YAML files (it's not that hard, but takes some minutes to get the hang of the syntax). If you want a complete overview, go to yaml.org and look at their specifications, you'll be amazed by how flexible (or inflexible, sometimes) it can be.

YAML, a data serialization language, is designed to be easily readable by humans. It can store any type of data (even in binary form, if you so want).

name: Dejaime #this is a comment
#the character ":" is used for value attribution, so the line above
#    means "name = Dejaime"

YAML has three basic structures: scalars, sequences and mappings. A document is formed by several nodes (or objects, if you prefer), that can be a scalar node, that holds information, or a map or sequence node, that hold other nodes. In a comparison to graphs, they can be branch or leaf nodes.
  • Scalars: as the name hints out, these are simple values, sometimes with an individual identifying name, sometimes without. Can be string, numbers...
name: Dejaime #this is a simple string scalar
age: 22 #this is me misstyping my age in a number scalar so it makes me look younger
description: > #This is a string scalar that spans over two lines.
    needs a haircut,
    badly.
  • Sequences: these are simply a list of nodes without any special identifications. They are easily accessible by index.
meals: ['0930', '1230', '1600', '2130'] #format hhmm, this is a string sequence
  • Mappings: also referred to as hashes and dictionaries, this is a structure that allows you to relate identifiers and their informations in a more direct way. These identifiers (also referred to as keys) are usually a simple string name and the information in them can be of any type; as in they can be numbers, strings or even other maps and sequences. It is comparable with an std::map or, in some cases, with std::multimap.
#same examples above, they are actually a map, where I give names to values.
person:
    name: Dejaime #maped value named "name".
    age: 22 #maped value named "age"
    description: > #maped value named ... I guess you got it.
        needs a haircut,
        badly.
    meals: ['0930', '1230', '1600', '2130'] #sequence mapped to a name, "meals".
    days_skipped_gym: #this is a sequence of sequences mapped to a name (phew!)
        [Oct2013, 31]
        [Nov2013, 30]
        [Dec2013, 31]
        [Jan2014, 31]
        [Feb2014, 14, and_growing] #they do not need to be uniform!

Of course, these can be used together to create more and more complex documents, which, in turn, can be used to store any kind of information. Just don't create a password vault with this, it wouldn't be a good idea.

The language itself has some neat features, such as unique key identifiers (marked by a "? ") and variables marked by the '&' and '*' characters.

?
  [sprite, zombie1] #this is a unique sequence identifier
:
  sprite_file: zombie1.png
  sprite_sheet_folder: &sprites_folder \home\dejaime\GameDev\Spritesheets\ #sprites_folder is a reusable reference
?
  [sprite, zombie2]
:
  sprite_file: zombie2.png
  sprite_sheet_folder: *sprites_folder #reusing the string above

Having GUIDs is a great way to serialize any types of asset or object data, but I personally don't like YAML's syntax for these unique keys (actually I hate it, but well, them's the breaks). In that example, I use a sequence (denoted by "[]") that allows me to use, for example [sound, zombie1] or [data, zombie1] later, when I want to store different types of data. This unique sequence key feature is extremely useful.

Using reference variables is also very handy, especially when you may need to change a variable on several items. In this example, if I change the folder where I hold my spritesheets, all I need to do is change that one single line, instead of editing every object that references that variable (or use a global variable in my engine).

The YAML syntax is quite tricky to get and there are lots of gotchas; your first 10~20 tries will have invalid syntaxes. Worry not!

Some syntax gotchas!:
  • No tabs allowed. The blocks in YAML are all defined by indentation, and they banned tabs. More info on this here.
  • White Spaces are meaningful when starting a line and are used to identify blocks (through identation).
  • ": " isn't necessarily ":". While "name: Dejaime" means that name is an scalar node and its value is "Dejaime", name:Dejaime means that the node is actually called name:Dejaime with type null. This is so we can have colons inside values like in "time_created: 19:03".
For those who want to try YAML, there's an online syntax checker called yamllint. It validates your text and then spits out a version optimized for ruby, which'll be of no use for us. The important part is just checking the syntax validity.

This should be enough for the article, but if you want to go deeper and into the fancy stuff, dive into the official specification.

Our Example Problem


We want to load sprites and their definitions, including all possible animations, frame duration, spritesheet location, and anything necessary, from a single data file. We'll be using the following (public domain) spritesheet:


Attached Image: duotone.png


The original is available here: http://opengameart.org/content/dutone-tileset-objects-and-character

We will assume that all frames of an animation are at the same horizontal level, with no border (just like in the sheet above) and each may have independent durations. We'll also assume a sprite has a unique name and can change between more than one animation (assuming their respective sizes).

All right. Now that we have our definitions and assumptions, we need to define our YAML file structure, so we can store the information necessary for the whole sprite. So let's list the basic informations we need to store:

Sprite:
  • Unique Name
  • Spritesheet ID
  • Animation List
  • Animation Name:
    • Initial SpriteSheet Offset
    • Animation Size
    • Frame List
      • Frame Duration
Basically, that is all the information we will want in our YAML file. It is a complex map, so let's break it down into how to list it.

The first information we need is actually the identifying name of the sprite (can be a number if you prefer):

Player:

Notice how I didn't use Name: Player, but simply inserted Player: directly. This means we have something called Player, and not that we have a node called Name and valued Player.

Now we add the spritesheet reference. This can be a numeric ID, the path to the spritesheet file or something else. I will be using the spritesheet file path, as we won't be using any file loader that would handle UIDs (wenewbies:D [or not]). This takes us to our next line.

SpriteSheet: /Resources/Textures/duotone.png

We now have the basic information for the sprite, and need to detail the animations themselves. As animations each have their own names, let us list these names, in our example:

Anim_Names: [run, idle, jump, die]

These are the four animations for the player sprite, all added in a sequence called Anim_Names, so we can look them up later.

Now that we know, in advance, the names of our sprite's animations, we can map them using their names with no problem!

run:
    #more code
idle:
    #more code    
jump:
    #more code
die:
    #more code

These animations also have specific informations: their size and their offset in the spritesheet.

Offset: {x: 0, y: 0}
Size: {w: 32, h: 32}

The animations also need to know how many frames they have as well as each of these frame's time duration. We'll do the same we did to list the animations names.

Frame_Durations: [80, 80, 80, 80, 80, 80]

We have an animation with six frames where all of them have 80 ms duration.

This is the last information we need to add to the animation, and to the sprite itself. Which takes us to our Player sprite configuration:

Player:
    SpriteSheet: /Resources/Textures/duotone.png
    Anim_Names: [run, idle, jump, die]
    run:
        Offset: {x: 0, y: 0}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80, 80]
    idle:
        Offset: {x: 0, y: 32}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 120, 80, 30, 30, 130] #Notice the different durations!
    jump:
        Offset: {x: 0, y: 64}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 120, 80, 80, 0] #Can I say 0 means no skipping?
    die:
        Offset: {x: 0, y: 192} #192? Yup, it is the last row in that sheet.
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80] #this one has only 5 frames.

And to add the remaining sprites:

Monster: #lam nam
    SpriteSheet: /Resources/Textures/duotone.png
    Anim_Names: [hover, die]
    hover:
        Offset: {x: 0, y: 128}
        Size: {w: 32, h: 32}
        Frame_Durations: [120, 80, 120, 80]
    die:
        Offset: {x: 0, y: 160}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80]
Gem:
    SpriteSheet: /Resources/Textures/duotone.png
    Anim_Names: [shine]
    shine:
        Offset: {x: 0, y: 96}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80, 80]

Now we have a small problem. As the entire file is a map, we'll need to know what are the unique names of our sprites (in this case, Player, Monster and Gem). In addition, there's no way to access them by a numeric index. It won't be a problem when we have some sort of level definition, specifying all of its objects and their respective Sprites by name, referencing our sprite definitions file. But even then, this line won't hurt:

Sprites_List: [Player, Monster, Gem]

So, this is our definite Sprites.yaml file:

Sprites_List: [Player, Monster, Gem]
Player:
    SpriteSheet: /Resources/Textures/duotone.png
    Anim_Names: [run, idle, jump, die]
    run:
        Offset: {x: 0, y: 0}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80, 80]
    idle:
        Offset: {x: 0, y: 32}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 120, 80, 30, 30, 130] #Notice the different durations!
    jump:
        Offset: {x: 0, y: 64}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 120, 80, 80, 0] #Can I say 0 mean no skipping?
    die:
        Offset: {x: 0, y: 192} #192? Yup, it is the last row in that sheet.
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80] #this one has only 5 frames.
Monster: #lol that lam nam
    SpriteSheet: /Resources/Textures/duotone.png
    Anim_Names: [hover, die]
    hover:
        Offset: {x: 0, y: 128}
        Size: {w: 32, h: 32}
        Frame_Durations: [120, 80, 120, 80]
    die:
        Offset: {x: 0, y: 160}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80]
Gem:
    SpriteSheet: /Resources/Textures/duotone.png
    Anim_Names: [shine]
    shine:
        Offset: {x: 0, y: 96}
        Size: {w: 32, h: 32}
        Frame_Durations: [80, 80, 80, 80, 80, 80]

yaml-cpp


The library yaml-cpp has two major versions right now, 0.3 and 0.5, both stable enough for use. The version we will be using here is 0.5 (0.5.1, as of this writing), since it has a new revamped API, one that makes a better use of C++ in my view.

It is available under this link: http://code.google.com/p/yaml-cpp/; X11 (MIT) license, so no worries here.

Building yaml-cpp


The compilation is simple, but you'll probably want to use the Boost library. Under linux platforms, to compile boost, you'll only need to run ./bootstrap.sh && ./b2 and it will be built. You may want to issue a sudo ./b2 install so boost is installed in your system (under /usr/local). You can also install boost with apt-get or Synaptic, but you'll get a slightly outdated version. Notice that building the entirety of the boost library can take long, but you can build only the ones you're interested in (I personally always build it all).

After you install boost on your system, yaml-cpp is also just as simple. Create a Build folder in the yaml-cpp root directory and issue cmake .. && make inside it, and it will be built. If you want, a sudo make install will install yaml-cpp in your system. If you didn't want to install boost, you may need to set the correct path in the boost variables of yaml-cpp CMakeLists file (or use ccmake, or even a cmake gui, if you prefer).

If you have problems or questions on this regard, please refer to their official building guides...

Parsing the File


As we are getting into the code part, I must put a license on it.

80x15.png
To the extent possible under law, Dejaime Antônio de Oliveira Neto
has waived all copyright and related or neighboring rights to
YAML-CPP C++ Example Code.
This work is published from: Brazil.


Now that we have our file, and we understand how it was created, we can go ahead and create our parser. First, we need to define our structure to hold that information inside our code.

Starting with the Sprite itself, this is what I'll use:

class Sprite {
	std::string m_sName;
	std::string m_sSpritesheetPath;
	Animation::p_vector m_pvAnimations;
	bool m_isLoaded;
public:
	typedef std::vector<Sprite*> p_vector;
	bool IsLoaded () const { return m_isLoaded; }
	bool Load (std::string file, std::string name);
	static bool LoadAll (std::string file, p_vector *target);
};

It has variables to hold the name of the sprite, the filepath to the spritesheet and a vector for the animations. The Sprite::Load(string, string) function takes a name for a sprite and a filepath to the .yaml file (not to be confused with the spritesheet). It can be used directly or by calling Sprite::LoadAll (string, vector[Sprite*]), that will create, load and push all sprites in the file into the passed sprite vector.

Our Sprites also depend on different Animations.

struct Animation {
	typedef std::vector<Animation*> p_vector;
	v2 m_Offset;
	v2 m_Size;
	std::string m_sName;
	std::vector<uint32_t> m_fvDurations;
	Animation () {}
	Animation (std::string p_name, v2 p_offset, v2 p_size) {
		m_sName = p_name;
		m_Offset = p_offset; //Overloaded = operator.
		m_Size = p_size;
	}
};

This is nothing more than a fancy struct.

As you can see, all the information necessary for any sprite can be stored in these, if it follows our initial assumptions. If you have special needs, you can just alter it to suit your needs. Maybe moving the spritesheet into the animation to allow the animations to be in independent textures, or even add more information on every frame such as size, to allow an animation to change in size on each frame. Another useful piece of info could be the origin of each frame (like the head or where to render the "poison cloud"). You get it, this example is indeed using a minimalist approach.

The Load function


//Returns false on error
bool Sprite::Load (std::string file, std::string name) {
 	if (m_isLoaded) return false; //Already loaded.

Since we have our basic structure to hold the information, we can now retreive it from the file and store on our objects, in order to use it. The first thing we should do is open our .yaml file. To do this, we need a yaml node, so it can assume the root node of the file. If you don't know what a node is, you can go back to our YAML introduction or look at their specs.

The yaml-cpp library works under the YAML namespace, and has a variable type for an yaml node, the YAML::Node, I guess I didn't really need to say that... Anyway, we need to declare an YAML::Node and assign the root node of our file to it.

	YAML::Node baseNode = YAML::Load(file);
	if (baseNode.IsNull()) return false; //File Not Found?

Now that we opened the file and have our root node at baseNode, we need to find the node for our sprite. The name of our sprite was passed to us as an argument, and that's what we are going to use. Here is where the library author used C++ operator overload to give us a really nice API, as you'll see:

	YAML::Node spriteNode = baseNode[name];
	if (spriteNode.IsNull()) return false; //Sprite Not Found?

We simply use our string as the index, and it will find the correct node. If it is not found, it will be left as a null node.

So, Sprite found, we can now start to load up our information, starting by the name. Of course, there's no need to look the name up, as we just received it as an argument, but we still need to look up the SpriteSheet path.

	//Set the name, that we know exists in the file.
	m_sName = name;
	//Set the SSheet path by casting the value of the SpriteSheet field
	m_sSpritesheetPath = spriteNode["SpriteSheet"].as<std::string>();
	m_isLoaded = true; //point of no return

With the sprite specifics on their place, the next step is to put all animations to the animation vector. Our code will need to know how many animations there are, but that'll be no problem:

	//Now, we need to parse the info on the animations
	short int totalAnimations = spriteNode["Anim_Names"].size();
	for (unsigned short i = 0; i < totalAnimations; ++i) {

Every animation is a node, so we should now retrieve it.

		//We get the animation by looking up the string value of the
		// i-th entry on the Anim_Names sequence.
		std::string tmpName = spriteNode["Anim_Names"][i].as<std::string>();
		YAML::Node animNode = spriteNode[ tmpName ];

With the animation node retrieved, we can now create the animation and populate it with the information from the file.

		Animation *tmpAnim = new Animation();
		tmpAnim->m_sName = tmpName;
		tmpAnim->m_Offset.x = animNode["Offset"]["x"].as<unsigned int>();
		tmpAnim->m_Offset.y = animNode["Offset"]["y"].as<unsigned int>();
		tmpAnim->m_Size.x = animNode["Offset"]["w"].as<unsigned int>();
		tmpAnim->m_Size.y = animNode["Offset"]["h"].as<unsigned int>();

The last thing we need to do now is to get the time duration for our animation's frames.

		unsigned short totalFrames = animNode["Frame_Durations"].size();
		for (unsigned short f = 0; f < totalFrames; ++f){
			tmpAnim->m_fvDurations.push_back(
				animNode["Frame_Durations"][f].as<uint32_t>()
			);
		}//Finished!
	}
	return true;
}

And Voilà! All our animations are now inside our Sprite object!

Here is the complete code: https://gist.github.com/dejaime/9129611

The functions we used here were:

YAML::Load(filepath); //Load a yaml file
YAML::Node.as<type>(); //Retrieve value casted to an specific type.
YAML::Node.IsNull(); //Find out whether the node is of null type.
YAML::Node.size(); //Gets the size of a sequence Node.

Want to try?


Start by creating a valid YAML file. You can test your syntax at yamllint.com. It will probably take some tries, but after you actually learn it, you will mostly get it right in one try. A single entry like your personal info will do just fine. After that, create a simple structure to hold that information and load it manually, inside your main.cpp directly if you prefer. Then, move the loading procedure into a class that can load the information itself and, lastly, make several entries and load them independently. Move on to more complex documents and you'll master it before you know it.

Thanks to jbeder for the yaml-cpp library! It is so convenient I'm getting lazy.
Also, thanks for Gaiden for the helpful review!

Misc info:
yaml-cpp version used: 0.5.1
compiled with boost 1.55.0



License


Public Domain




Comments

What's the difference between YAML and JSON? They look very much the same?

I know YAML & JSON are the cutting-edge "technologies" atm but:

 

Why not just write the XML with attributes ?

<settings>
  <graphics vsync="true" quality="ultrahigh">
    <resolution width="1920" height="1080" />
  </graphics>
  <gameplay difficulty="hard" invert_y="false" />
</settings>

or even removing <resolution> and with minimum formatting:

<settings>
  <graphics 
    vsync="true"
    quality="ultrahigh"
    width="1920" height="1080" />
  <gameplay
    difficulty="hard"
    invert_y="false" />
</settings>

 

I know YAML & JSON are the cutting-edge "technologies" atm but:

 

Why not just write the XML with attributes ?

<settings>
  <graphics vsync="true" quality="ultrahigh">
    <resolution width="1920" height="1080" />
  </graphics>
  <gameplay difficulty="hard" invert_y="false" />
</settings>

or even removing <resolution> and with minimum formatting:

<settings>
  <graphics 
    vsync="true"
    quality="ultrahigh"
    width="1920" height="1080" />
  <gameplay
    difficulty="hard"
    invert_y="false" />
</settings>

 

JSON (JavaScript Object Notation) is a minimal, human-readable format. It is used a lot on the web and in other networking technologies because it is less verbose that XML and easier to parse. YAML is similar (it is a superset of JSON) that is intended to be easier (and more natural) to read for humans.

 

Whether you choose to use one or the other depends upon your problem. They are tools to be used when appropriate. They have strengths and weaknesses. E.g. XML does not neatly support lists of items without a large number of unnecessary elements, unlike JSON or YAML. On the other hand, I have found that when the elements of your data have metadata like attributes, XML can be a better way to represent that.

 

Another thing to consider, given where JSON in particular is common, is the size of the support data of the format relative to the data being sent. XML is extremely verbose compared with JSON, so it is not generally considered a good fit for data being sent over a network.

 

-Josh

you've made a strong case for yaml's use here imo.  it's defiantly a strong contender for configuring a game, and it's data, particularly being able to easily handle arrays of objects.  but imo it's not well suited if having to handle a heavy amount of tree hierarchical structure, which is where i think xml shines a bit better.  don't get me wrong, yaml can do it, it's just i'd rather look at:

    <a attrA="0" attrB="1" attrC="2">
        <a attrA="0" attrB="1" attrC="2">
            <a attrA="0" attrB="1" attrC="2">
            </a>
       </a>
   </a>

vs(and correct me if i got the syntax wrong here):

a:
    attrA: 0
    attrB: 1
    attrC: 2
    a:
       attrA: 0
       attrB: 1
       attrC: 2
       a:
           attrA: 0
           attrB: 1
           attrC: 2

please correct me if their is a simpler way

 

edit: after seeing how yaml supports arrays of elements, i'm cosidering modifying my xml parser to support things like:

<a value=[0,1,2]>

for supporting attributes with arrays of objects in xml.

or why don't you use v-son a j-son like parsing language, but c++ like ?

here is an example :

 

/* simple configuration file
  you can use c styles comments */
 
Level1
{
 
      // important : strings do not accept whitespaces
 
     String="Strings_do_not_accept_whitespaces";
 
     FileName="dir1\\dir2\\file.ext";
 
     // a simple math expression
 
     Expression=( 10 +20 ) *2;
 
     // a numeric list , you can put expression as elements too
 
     List=[ 10 , 20*2 , (30-20)/2 ];
 
    Level2
    {
        String="dir1\\dir2\\file.ext";
 
       Level3
       {
           Expression=( 10 +20 ) /2.5f;
 
           Level4
           {
              // an heterogeneous list, you can mix numeric expressions
              // and strings too
 
             List=[ 10 , 20*2 , (30-20)/2,"file.ext" ];
 
           }
       }
   }
}
 
the parser give you the data tree along with variables in every node and cn be easily access using a c++ sdk , writing one for your favourite language isn't difficult either.

What's the difference between YAML and JSON? They look very much the same?

That's true. Actually, most valid JSON files are also valid YAML files. The only exceptions are probably when a json file has multiple entries with the same mapping keys, that's illegal in YAML, while JSON only suggest they should be unique, but doesn't really enforce it. YAML specs have a bit more here: http://www.yaml.org/spec/1.2/spec.html#id2759572

The short version is that yaml sacrifices flexibility for more readability.

you've made a strong case for yaml's use here imo. it's defiantly a strong contender for configuring a game, and it's data, particularly being able to easily handle arrays of objects. but imo it's not well suited if having to handle a heavy amount of tree hierarchical structure, which is where i think xml shines a bit better. don't get me wrong, yaml can do it, it's just i'd rather look at:

please correct me if their is a simpler way

edit: after seeing how yaml supports arrays of elements, i'm cosidering modifying my xml parser to support things like:
for supporting attributes with arrays of objects in xml.

I wouldn't recommend human readable data for anything that could be labeled as heavy. Still, if you want a counterpart to that code, if you don't mind values listed on the same line, it could be:

a: {attrA: 0, attrB: 1, attrC: 2,
    a: {attrA: 0, attrB: 1, attrC: 2,
        a: {attrA: 0, attrB: 1, attrC: 2}}}

or why don't you use v-son a j-son like parsing language, but c++ like ?
here is an example :
the parser give you the data tree along with variables in every node and cn be easily access using a c++ sdk , writing one for your favourite language isn't difficult either.

If you don't mind creating your own parser, you can use anything, you don't even have to bind it to a language. The advantages of using one of those is just that it wouldn't require any designing nor programming, just download a library and go.

I only created mine once, that was not nearly as flexible as YAML. It allowed me to address the parser's optimization much better, in addition, I created a separate program that could read the file and save (the data it was supposed to serialize) in a continuous binary format; this, in turn, could be read in a single read step or in chunks (defined depending on the size of these chunks).

But it is almost the same case of writing your own scripting language. Unless you have a really good reason to do it, I'd recommend using an existing standard and parsing library. The cost-benefit is usually better...

you've made a strong case for yaml's use here imo.  it's defiantly a strong contender

Slicer, for the last time, each time you use "defiantly" as a synonym of "definitively" (it isn't!), a puppy and all his puppy brothers die a horrible death in a pit of fire, blood and the suffering of all the lost souls that were banished from the mortal world since the beginning of times.

 

On topic: Nice article! First time I wanted to do some serialization I instantly looked around for alternatives to XML and found YAML (around the time I made the Why is XML all the rage? thread). Its very nice to work with, and for the little config files I have, its very simple to edit them with a text editor (I use jEdit which has YAML syntax highlighting).

Good article.  One of the better written and useful ones I've seen in recent months.

 

Regarding some of the other comments, I personally steer people away from JSON for reasons of merging data branches in your version control system. JSON has too many little fiddley bits that merge tools don't understand and humans tend to have trouble with (comma separators in sequences/maps, but not after the last one, being the prime example). Especially if we're talking art content, you don't want artists to have to mess with getting the delimiters just right.

YAML vs XML is a trickier question. Both are decent for merging, since they don't have separator tokens that get munged up when edits to the end of sequences are merged. I personally prefer XML since the various related tools are much more developed and mature, but YAML works just fine.  I find it useful to write schemas for the XML files so that the content pipeline can easily find mistakes while converting the XML to compressed and packed binary formats without it needing to have explicit code written for every type of file.  It also helps in those cases that you need to edit files directly since there are many high-quality XML editors that can use the schemas to make the editing experience that much nicer (auto-complete, data validation, etc.).

I don't buy any arguments about readability outside of merging concerns. If anyone is actually reading or editing your data files directly in day-to-day activities, you have seriously messed up when building out your technology. Build better tools and expect all content editing to happen though a purpose-built tool, not Notepad.  Your runtime data files may well end up being compressed binary files after all, so at the very least you'll need tools to dump the data in your file formats in whatever human-readable format you want (without needing to worry if that debug dump format is machine readable).

For configuration files I tend to prefer some form of the INI format.  It's easy to read, doesn't "probably take some tries" to get right, and it's what most games tend to use so gamers are familiar with it (though you should have an in-game options menu or a separate boot/config tool for anything worth editing in the config file, making its format irrelevant).

A useful and missing part of the discussion might be a brief word about the biggest difference between XML and YAML/JSON -- which is that XML is a formally-structured document (at least, optionally), whereas YAML/JSON are not formally-structured. That means, for example, that you can define an XML application, validate, and transform it with standard mechanisms, but there is no equivalent to that in either YAML or JSON -- the data has structure, but as a matter of convention alone. As a concrete example, the animation names in your example *imply* a corresponding set of values for each animation, but don't *require* that there be a corresponding set of values in the same way an XML application does.

 

There are pros and cons on both sides, but they're different things, and suitable for different tasks.

 

I'l also note that I'm not sure its fair to say that human-readable formats are unsuitable for heavy-use: Valve is working on some graphics diagnostics tools for openGL as part of their Linux push. What this involves is serializing all GL commands to a file using JSON. Its similar to PIX, or to Graphics Diagnostics in Visual Studio. The record files can be quite large, hundreds of megabytes or even several gigabytes per frame. Still, this is what they've chosen, and they are able to play back the recorded GL commands in real-time -- they even do periodic key-framing so that you can skip ahead or backwards. And at the same time, the format is also so simple that a student learning openGL can actually sit down and bang up one of these JSON files to experiment, rather than program and write a bunch of boilerplate (They showed a single-page JSON document that drew a single textured triangle).


Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS