XML and XSLT, Part 1.

posted in There is no escape from the Washu

Published September 12, 2005

XML
XML is a particularly interesting format for me. For those of you who don't know what XML is, it's a text based markup language that can be extended to include the tags and attributes you require for describing your data. It has many components that make it both simple and complex. You can write up schema's, in XML, that describe how an XML document should be laid out, what elements go where, and what attributes those elements have. You also have the ability to essentially free form it, avoiding the need to use a schema, although at the cost of verifiability.

Now, before you go jumping all over XML to do everything, don't. XML has its uses, but it certainly isn't an all purpose solution to every problem. Using XML is just like using patterns: For some people, once they've got the hammer, they see everything as a nail. There are some things I would never use XML for, such as a model file. While models as XML sound like a great idea at first, in reality it just isn't needed. First of all, a binary format will be smaller than the XML equivalent. Secondly, a model file really isn't humanly readable. The values don't make a whole hell of a lot of sense, and so there is no reason to bother with the added size that XML gives, since you get nothing back.

Because XML is text, and hence can be read by a human, it gives us the ability to put into text format certain features of a game that are easier done in text than binary. Lets look at a simple example:

"1.0"

encoding="utf-8" ?>

"1">
Orc

"6" count="2" />

"4" count="1" />

"1" />
"3" />

"3" maxAmount="10" />
"5" chance="0.001" />

This looks fairly simple, and it's easy to read too. We can tell that we have an Orc here, with between 2 and 12 hit points, and a base armor class between 1 and 4. We also see that it has a weapon, and some armor, and drops an amount of gold, along with some item. But what are these weapons, armor, and items? Well, lets take a look at the item.xml:

"1.0"

encoding="utf-8" ?>

"1">
Short Sword
A small cheaply crafted iron sword.
"weapon" subtype="1hsword">

"4" count="2"/>

"3">
Leather Armor
A crudely crafted armor made from the skin of animals.
"armor" subtype="light">

"4" count="1" />

"5">
Ruby
A brilliant red gemstone.
"treasure" subtype="gemstone">

"6" count="3" />

So, now we can see that our Orc is wearing Leather Armor, holding a Short Sword, and has a 1 in 1000 chance of dropping a ruby, along with some gold. Looks fairly descriptive, easy to read and change, wouldn't you say?

Next time we'll look at using XSLT to add more information to this document.

Previous Entry Patterns and Refactoring, Part 1

Next Entry XML and Binary formats, designing for flexibility.

0 likes 7 comments

Comments

noaktree

Quote:never use XML for, such as a model file

I see that the Ogre system uses XML as an intermittent model file which is then translated to a binary format. I like this approach. It allows models to easily be imported and exported using various custom tools in the production pipeline. The XML file is compiled to binary at finish.

Quote:Secondly, a model file really isn't humanly readable.

I really wish you would have mentioned this earlier. *Stops manual parsing techniques* [grin]

September 12, 2005 09:37 PM

HopeDagger

Easy for the users to modify as well, unless that's the goal. Of course the user has nobody to blame but themself if they ruin the gaming experience by setting the Orc's strength to 5000. :)

September 12, 2005 09:50 PM

superpig

Quote:Original post by Washu
First of all, a binary format will be smaller than the XML equivalent.

I agree that you don't want to use XML for production assets. As source material though...

Quote:Secondly, a model file really isn't humanly readable. The values don't make a whole hell of a lot of sense, and so there is no reason to bother with the added size that XML gives, since you get nothing back.

You get a lot back - you get semantics.

Human readability isn't the major benefit in a situation like this. Machine readability is. If you establish and use a common dictionary of semantics, then you are free to throw your data out in whichever format really suits you, and other tools in your asset pipeline will be able to use it in a meaningful way without forcing you to write support for that particular layout and content type collection.

This is actually exactly what Microsoft's .X model format does - the uncompiled version of it, anyway. They define a syntax (< and >-level stuff), and a vocabulary of around 50 'templates' (better known as semantics) that you mark up the data with. 'Templates' can be hierarchical... sounds very much like XML to me. In fact, the only real differences are the syntax and the fact that each template name has an associated UUID to reduce name collisions.

I'm thus free to include as much or as little animation data, vertex colour data, progressive mesh data, etc... as I like. And my tools can use as much or as little of it as they like.

September 13, 2005 03:20 AM

Washu

Quote:
Human readability isn't the major benefit in a situation like this. Machine readability is. If you establish and use a common dictionary of semantics, then you are free to throw your data out in whichever format really suits you, and other tools in your asset pipeline will be able to use it in a meaningful way without forcing you to write support for that particular layout and content type collection.

Hence why a binary format for a model file would be beneficial, binary formats are inheritly easier for a machine to read than parsing an XML file into a DOM tree (faster too). Binary formats can also be made just as flexible as XML, as long as you put the effort into adding such flexability.

Quote:
This is actually exactly what Microsoft's .X model format does - the uncompiled version of it, anyway. They define a syntax (< and >-level stuff), and a vocabulary of around 50 'templates' (better known as semantics) that you mark up the data with. 'Templates' can be hierarchical... sounds very much like XML to me. In fact, the only real differences are the syntax and the fact that each template name has an associated UUID to reduce name collisions.

Yes, the X files and their templates are interesting, but you could just as easily accomplish this in a binary format, which machines will always have an easier time reading than POTF.

Quote:
I'm thus free to include as much or as little animation data, vertex colour data, progressive mesh data, etc... as I like. And my tools can use as much or as little of it as they like.

They can use as much as you initially included that is. The thing is though, you can do exactly the same with a binary format (see 3ds files). The only real benefit you gain from using XML is that everyone knows what they need to use in order to open and extract the data from these files, this shared piece of code would certainly reduce the number of maintenance points. However, coming up with a binary solution would not be that hard to do either, and would also be shared.

September 13, 2005 09:34 AM

EDI

We use XML 'extensively' (no pun intended) In Morning's Wrath

We used it to define:

Generic Sprites

Characters (through extension of sprite objects and adding new xml fields>

Window Layout

Random Item Generator Defenitions

It saved us incredible amounts of time, especially since for a long while our specifications for these things were changing frequently; using xml meant that an addition or a removal of a property or feature did not meant we had to tweak an editor or even do much tweaking of the XML querying code.

And of course XML is plain-text, so editing is very simple and integrates with existing text editing software, it also means there is no need to write an editor, which along with the matience of that editor can be a huge pain.

An especialy great use we found for XML was to parralel an xml format with subclassing, for instance:


<?xml version="1.0"?>
<sprite>
<image>image.jpg</image>
</sprite>

A Sprite object could read this file and determine what image to use.

You can also subclass the sprite object and make a Character Object.

The Character object will expect XML like this:


<?xml version="1.0"?>
<sprite>
<image>image.jpg</image>
<maxhealth>100</maxhealth>
</sprite>

Extraction of this data is as easy as overriding the 'load' method of the Sprite superclass.

And the real benefit?

A sprite object can use a character object's XML file, since it will simply ignore (not query) the extra data.

This was found to be very helpful in games, especially where you might want a character which is a full featred instance of a Character, but you might also want a Sprite object which only /looks/ like the character, the same XML file can be used for both instances.

=D

September 13, 2005 12:58 PM

superpig

Quote:Hence why a binary format for a model file would be beneficial, binary formats are inheritly easier for a machine to read than parsing an XML file into a DOM tree (faster too). Binary formats can also be made just as flexible as XML, as long as you put the effort into adding such flexability.

So, how might you structure your binary format? Perhaps you'd have some magic value to identify the beginning of a chunk, followed by a chunk identifier of some sort. Would a single byte be enough for that? I don't know if we can come up with more than 256 different types of data to store. I guess to be futureproof we'd better allow more than a single byte... but most of the time we'll only need that initial dictionary of 256 identifiers, so there's no point wasting space. I know, we'll stick another magic number after the identifier, so that when you're using something in the inital 256 dictionary you don't need to waste a byte with stuff for the extended dictionary.

For magic numbers, I think we could use 0x3C for the 'chunk start' identifier and 0x3E for the 'end of chunk type identifier' identifier. Sound good to you?

September 13, 2005 03:52 PM

Washu

Ahh, an interesting question Superpig...

Lets start:
First of all, lets say we use 7-bit encoded identifiers for our chunks. This way if we only have 8 chunks, they will take up only 1 byte for the ID, if we have 65535 or more chunks, they will take up 2 bytes each, and etc.

Secondly, we'll use a length prefixed system for the data. This way we can have arbitrarily large chunks (think XML, do you really know how long a node is till you've fully parsed it? No, you don't, so we'll assume the same here). Again we'll use 7 bit encoded integers, since most of the data will be smaller than 256 bytes.

So, our current chunk layout looks like this:
[7bit ID][7bit Length][data]

So, what would be some of the chunks we would need? Well, how about a vertex chunk: [1][12][x,y,z], a normal chunk: [2][12][x,y,z]...but wait, those two are the same, so lets change them to be one and call them a vector: [1][12][x,y,z], then we'll use a storage chunk to differentiate between them:
Normals Chunk: [2][n][vector's]
Vertices Chunk: [3][n][vector's]

So, a sample file might look like this:


[3][3 * sizeof(vector)] [
  [1][12][0,0,0],
  [1][12][1,0,0],
  [1][12][0,1,0]
]
[2][3 * sizeof(vector)] [
  [1][12][0,0,1],
  [1][12][0,0,1],
  [1][12][0,0,1]
]

(Note that this is a textual representation of the binary data). So, what do we know about our format? Well, about as much as we know about any XML format. In fact, we have the same problems between the two formats: "In order to uniquely identify a node, they must have a unique name." In this case, that name is the ID. In XML it's the node name. Any ID's we don't know about, we can just skip over.

In XML this might look something like:


<model>
  <vertices>
    <vector x='0' y='0' z='0' />
    <vector x='1' y='0' z='0' />
    <vector x='0' y='1' z='0' />
  </vertices>
  <normals>
    <vector x='0' y='0' z='1' />
    <vector x='0' y='0' z='1' />
    <vector x='0' y='0' z='1' />
  </normals>
</model>

September 14, 2005 11:40 AM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

Washu

Author

XML and XSLT, Part 1.

Comments

Washu

Latest Entries

Sweet Snippets - Handling Input and Callbacks with Awesomium

Sweet Snippets - More Using Awesomium and Direct3D

Sweet Snippets - Rendering Web Pages to Texture using Awesomium and Direct3D

Sweet Snippets - More Text Rendering with DirectWrite/Direct2D and Direct3D11.

Sweet Snippets - Rendering Text with DirectWrite/Direct2D and Direct3D11.

C++ Quiz #4

C++ Quiz #4

C++ Quiz #3

C++ Quiz #3

C++ Quiz #2

XML and XSLT, Part 1.

Comments

Washu

Latest Entries

Sweet Snippets - Handling Input and Callbacks with Awesomium

Sweet Snippets - More Using Awesomium and Direct3D

Sweet Snippets - Rendering Web Pages to Texture using Awesomium and Direct3D

Sweet Snippets - More Text Rendering with DirectWrite/Direct2D and Direct3D11.

Sweet Snippets - Rendering Text with DirectWrite/Direct2D and Direct3D11.

C++ Quiz #4

C++ Quiz #4

C++ Quiz #3

C++ Quiz #3

C++ Quiz #2

Reticulating splines