XML in Games
xml #39 it' language data element tag opening file
The good, clever folks down at the W3C have drafted another language (also derived from SGML, HTML's big daddy). This language is known as XML, and is a very good thing. Firstly, it's another mark-up language ("eXtensible Mark-up Language", to be precise). But it doesn't have any defined 'tags' or keywords whatsoever!
How is that useful? you ask. What can we do with a language that has no words in it?! Well, the reason it's known as an 'extensible' mark-up language is that you can 'extend' it - that is, make it up. You create the words, and everything adheres to the language 'grammar.' XML is meta-data: data about data. The full language specification is at http://www.w3.org/TR/REC-xml, and while it's heavy reading, describes every aspect of the language from start to finish.
OK, so what's so great about this, then?
If your program dumps a whole load of data to a file, then what happens when you want to use that data in another program? You have to drag up format specifications, the code that created the file in the first place, and so on. The reason is that once your data's in the file, that's all it is: data. A stream of numbers with no real meaning to anyone or anything. You've effectively encrypted it - anyone who doesn't have the format specification will have no idea how to read the data. Sure, they could try and figure it out - but that's as slow and difficult as standard code-breaking.
Surely, in the days of object-orientation and massively-multiplayer online games, there must be a better way? I think XML can fill part of the gap.
A single 'item' in XML is called an "element". An element consists, at a bare minimum, of a tagname (in HTML, things like P, H1, or TABLE are tagnames), which is in an opening tag and a closing tag. If my tagname is "gibbon", I could write it like this:
<gibbon></gibbon>In fact, because there are several cases where there's nothing between the two tags, you're allowed to shorten it to this:
<gibbon/>You have to have the forward-slash at the end there, so that the XML parser knows not to look for a closing tag.
At it's fullest, an element can have three things: Attributes, Children, and Data.
An Attribute is a "name=value" pair (e.g. 'family="mammal"'). All the attributes go in the opening tag, after the tagname:
<gibbon family="mammal" size="big" bottom="red"/>Children are other elements, which are 'contained' within the first element. What that really means depends on how you interpret it; it could be that the child is 'inside' the parent (if the parent is, perhaps, a box of some sort), or it could be that the child is literaly a child of the parent, like so:
<gibbon> <babyGibbon/> </gibbon>The babyGibbon, as a seperate element, is in it's simplest form - no attributes or children. However, because the gibbon now has a child, you can't write in the condensed form; you have to have seperate opening and closing tags, as shown.
Finally, an element can have 'data.' Data is anything that you put between the opening and closing tag, and which isn't an element. At it's simplest, you can just have plain text in there - there's also something called CDATA, which you use when your text might contain < and > symbols (thus confusing the parser).
There's one last rule about XML. All your XML has to be 'well-formed.' To do that, you just have to make sure that every opening tag has a matching close-tag (or is in the condensed form), and that you close things in the same order you open them. So, you can't do this:
<box> <bag> <thing/> </box> </bag>Instead, you need to do this:
<box> <bag> <thing/> </bag> </box>Keeping things well-formed will help you out a lot. It's much, much easier for the parser to treat the XML code as a tree or stack; and if your code isn't well-formed, it won't be able to. In the first example, after the 3rd line (<thing/>) my stack of elements looks like ":box:bag". After the next line, it becomes ":bag". That doesn't work, because there is no 'bag' element at the top level. HTML let you get away with this; XML is not so forgiving.
Conveniently, Internet Explorer (up till IE6, at least), when presented with an XML file, will check it and display it as a tree (and tell you if you messed it up), so you can check your XML syntax and layout by opening it in IE. There are plenty of other syntax-checking utilities out there, of course - including, I'm sure, something to write the XML for you, while you just build up a tree of your elements.
Here's a little chunk of XML:
<?xml version="1.0"?> <fridge> <cheese type="cheddar" flavour="mild"/> <cola/> <tupperware_box size="large"> <sandwich state="half-eaten"/> </tupperware_box> </fridge>And viola, the contents of my fridge.
According to the code above, my fridge contains some mild cheddar cheese, a can of cola, and a Tupperware box containing a half-eaten sandwich. Could you get that just by reading it? I'll guess you did - well-written XML is very easy to understand like that. If I were to eat more of the sandwich, and put, I dunno, a piece of broccoli into the box, I could just change the code to:
<sandwich state="three-quarters-eaten"/> <broccoli desirability="none"/>then you get the idea.
You may be wondering about that first line - <?xml version="1.0"?>. It's given in the spec as a requirement for 'proper' XML data - really, it just gives the version of the language used to make the file (as the language will, nay, has, changed - they're already up to 1.1, but the parsers are still catching up). It's not totally necessary, if your file sizes are constricted or something, but it's a good thing to use.
XML in games
In my opinion, XML could be a valuable technology in games. It may not seem so, but I'll give a couple of applied examples.
It's not too hard to describe adventure game worlds in XML. For example:
<object id="theBox" name="box" onLook="string:look_full"> <string id="look_empty">It's a box. It's empty.</string> <string id="look_full">It's a box. I think there's something in it.</string> <object id="theBall" name="ball" onLook="It's a ball." onPickup="do: set theBox:onLook string:look_empty; get theBall /> </object>So I've got a box, with a ball inside it, on the screen. I point at the box; the graphics have defined that rectangle as referencing object 'theBox,' so the game looks up the object 'theBox' and gets the 'name' attribute, so while I'm pointing the word 'box' appears on the screen.
I've fudged together a pseudo-scripting language as an example...
I click 'Look at' and then click on the box. The game looks up the 'onLook' attribute, and reads what it says. Normally, it would just print the value out literally, but because it begins with 'string:', it looks up the string "look_full" (part of the 'theBox' object), and displays it. "It's a box. I think there's something in it."
I click 'Pick up' and then click on the ball, that I can see in the box. The game looks up the 'onPickup' attribute, and because it starts with 'do:' it executes the command shown on my cunning adventure-game VM. The value of the 'onLook' attribute of theBox gets set to "string:look_empty", and the game "get"s the ball - something which I predefined as a command.
If I then look again at the box, the string "look_empty" is looked up, so I get "It's a box. It's empty."
Bear in mind that all the script engine work is done by my game engine; XML doesn't do that for you, it just allows you to store the relevant information in a simple way. It's easily represented with a few classes (and I mean, two or three), and can be serialised efficiently (for saving/loading games - the entire world can be saved just by recursively writing out each element with all children). It's also excellent for testing or creating; you can edit the world state in Notepad and test the effects, rather than having to play with a hex editor or compile with custom-built tools.
Talking of saving/loading games, there's an entire application right there.
<npc class="hairy_monster" health="50"/> <box class="water" state="full"> <npc class="shark" health="74"/> </box>So, we've got a hairy_monster, a volume of water (which is full, as opposed to drained), containing a shark.
The only disadvantage of using XML as a format for saved games is that it might be a little too easy to read... you might not want people editing their saved games. However, in that case, all you need to do is encrypt/decrypt the file before/after you use it.
<folder alias="models" loc="game/media/models"> <file type="model" name="player_front" loc="player/pfront.mdl"/> <file type="model" name="player_hand" loc="player/phand.mdl"/> <folder alias="weapon_models" loc="weapons"> <file type="model" name="shotgun" loc="shotgun/shotgun.mdl"/> <file type="model" name="shotgun_inhand" loc="shotgun/shot_hnd.mdl"/> </folder> <pack loc="miscmdl.pak"> <folder alias="gibs" loc="gibs"> <file type="model" name="gib_1" loc="gib1.mdl"/> </folder> </pack> </folder>And so on and so on. The above code describes a simple layout of a few files on disk and in a .PAK file (or whatever you want to use, maybe ZIP files, maybe your own equivalent). It'd be brilliant for virtual file systems; perhaps as a packing list (to ensure that the whole package is present, or by adding checksums to each item to check that it hasn't been tampered with), or information about files to load at start-up. In my game code, I could call 'LoadModel("shotgun")' and it'd be able to look up the "model" "shotgun" (located at "shotgun/shotgun.mdl" inside folder "weapons" inside folder "game/media/models", making a grand total of "game/media/models/weapons/shotgun/shotgun.mdl"). If you happen to move all your assets around, you don't really want to be changing all your code. Not to mention the fact that this method gives you the capability to change things on the fly - you could change the "loc" of the "models" folder to "games/models" if you discover that you've been configured to use the old models, for example. It also lets you look at the files you've got without accessing the hard disk; if I want to spawn a random gib, I can just pick a random child from "gibs" and load it, without having to check through the directory itself.
I hope I've set your mind off a little bit. These are only a few examples; but quite frankly, it appears to me that XML can be applied to, well, anything. You can describe structures or interfaces in it; dword name="dwWidth"> [Ed Note: no idea what is missing here] could be useful sometime. You can use items to reference other items (as I demonstrated with the adventure game example). Heck, you can do anything.
Next time, I'll look at how we actually use XML in code - I'll show you how to use a particular XML parser, expat (http://expat.sf.net/), to read the XML data in and get it into a tree structure, and then I'll show you how to write it back out again.
I wonder why I write all my articles at 2:30am... and then revise them at 3:00am...
Richard Fine (a.k.a. Superpig) email@example.com, or catch me in the forums or on #gamedev. Happy coding.