YAML vs JSON vs XML?

Started by
21 comments, last by Ravyne 10 years, 7 months ago

I'm working on a 2D RPG in Java using LibGdx, and need to find a way to store my game data. The things I've considered are YAML, JSON and XML. I only have experience in working with XML, but the others can't be too hard to learn. What I need to achieve is being able to define items to be generated by listing the components that the item would have and my EntityGenerator would turn them into game objects.

Additionally, I would like to be able to build myself an "Item creator" utility where I can use a GUI to simply check off the components to add for my item and it would add that into the data file.

Any suggestions?

Advertisement

I can't really speak to the pros/cons of YML, I wrote a quick and dirty parser for it a while ago for a webapp, but that's my entire experience. As far as JSON is concerned, if you're not using Javascript, there isn't much point. The reason that JSON is so nice in web development is that it is instantly recognized by the browser's JS interpreter as an object so there's no additional parsing necessary. This is only true in Javascript, though. Any other language requires a custom library to interpret JSON.

Personally, I would say work with XML. For one thing, you already know it. For another, a lot of people already know it, so if you end up bringing another person in on your project, they won't be stuck trying to familiarize themselves with a new data format (and if the person you're bringing in isn't at least somewhat comfortable with XML, you might need to ask if they're a good fit).

Those are the three semi-popular go-to choices.

The downside to YML is that I've heard in a similar thread to this one that what few library options exist, none of them are really implemented in exactly the same way. Thus YML is loosely standardized in practice.

JSON is widely popular and relatively compact, but lacks real tooling.

XML is also widely popular, and has an extensive tools ecosystem (editors, validation via DTDs, XPATH, transformations via XSLT), but XML formats tend to be very, very verbose. Verbosity is fine during development, but its burdensome when you've got to write that data to every user's disk, or push it over the wire to them.

LUA can also be used for data interchange, in addition to scripting duties.

The question, really, is what do you actually need? XML is a structured data format language, whereas JSON or YML are (loosely) structured data interchange formats. In other words, you can use XML to define complex, hierarchical, rigid, versionable data formats, and then use them to exchange data using that format; JSON or YAML is structured in a sense, but only by convention -- your program only understands it as well as it's kept up with the latest changes, and no alarm-bells go off when unexpected data is found.

Personally, I like XML for some things as a content-authoring format, but I dislike it as a content-delivery format. If you use XML to author content, you can use XSLT to transform the authoring format into a format that's more-suitable for delivery, which might be JSON or a stripped-down, lighter-weight XML schema -- perhaps JSON data wrapped in a CDATA section within an XML root node that's just used for versioning.

throw table_exception("(? ???)? ? ???");

I think I'll stick to XML like was suggested above. No need to learn extra technologies to achieve something that can be done with a protocol that I already know.

Does this look like a good structure to you? Or is there a better way?


<?xml version="1.0"?>
<item>
	<id>1</id>
	<name>sword</name>
	<components>
		<wieldable></wieldable>
	</components>
</item>

I think at the very least you could abbreviate that format -- remember that XML is really meant to be a semantic format -- I might do something like the following:


<item id="1" name="sword" category="weapon">
  <description> Just a regular sword.</description>
  <components>
    <wieldable />
  </components>
</item>

There's quite a bit of art to designing and XML schema, but good designs use elements and properties together in a way that reduces redundancy and encourages correct use. You can define your format using a DTD and validate such files before they touch your game or content pipleline.

Keep in mind that <item> might or might not be a valid element in your schema depending on how much commonality all the different kinds of items have with one another. Say, for example, that it makes no sense for some sub-class of items to have any components, but another sub-class of items might require at least one component. You could enforce this just in the DTD, but it could make maintaining the DTD complicated -- it might be better to not have a generic <item> element, but instead have elements for the different subsets, say <weapon> and <potion>. Like I said, there's some art and intuition behind these kinds of decisions, just think carefully about them rather than tossing the first thing that comes to mind together.

throw table_exception("(? ???)? ? ???");

I think at the very least you could abbreviate that format -- remember that XML is really meant to be a semantic format -- I might do something like the following:


<item id="1" name="sword" category="weapon">
  <description> Just a regular sword.</description>
  <components>
    <wieldable />
  </components>
</item>

There's quite a bit of art to designing and XML schema, but good designs use elements and properties together in a way that reduces redundancy and encourages correct use. You can define your format using a DTD and validate such files before they touch your game or content pipleline.

Keep in mind that <item> might or might not be a valid element in your schema depending on how much commonality all the different kinds of items have with one another. Say, for example, that it makes no sense for some sub-class of items to have any components, but another sub-class of items might require at least one component. You could enforce this just in the DTD, but it could make maintaining the DTD complicated -- it might be better to not have a generic <item> element, but instead have elements for the different subsets, say <weapon> and <potion>. Like I said, there's some art and intuition behind these kinds of decisions, just think carefully about them rather than tossing the first thing that comes to mind together.

I'm not fully sure what I'll need for my items yet as I'm just starting development. I like the idea of not using a generic <item> element, but using elements for different categories of items there could be. Do you have any other suggestions that I may need to have? I'm sure I'll be running into complications in no time otherwise.

Well, like I said its all very dependent on your particular situation. The closest things I can give as rules-of-thumb are:

  • Include versioning information in your root element for maximum robustness. An application that reads your XML file should be able to understand all schema versions within a major version number -- it may have too little information (the program should choose reasonable defaults) or too much information (the program should ignore what it doesn't expect) so the experience may be degrated, but strive to make it work.
    • Increment the minor version number with each schema change.
    • Increment the major version number when the schema change is such that an application reading the file can no longer provide reasonable defaults for necessary information (e.g. a breaking change in the schema).
  • If a thing that exists in your schema can have children, or large and/or complex content, make that thing an element.
  • If a thing that exists in your schema can have siblings under the same parent element, make that thing an element.
  • If a set of things that exist in your schema are logical siblings (such as components); that is, they represent the same concept, but do not share an element name, consider creating an element who's only job is to contain those things (like <components> above).
  • If a thing that exists in your schema does not have children, is small, simple or otherwise relatively "atomic". its a good candidate for being a property.
  • Don't be afraid of using properties to store data that you have to parse later. Examples in common use are properties containing universal timestamps, or javascript expressions in the On<X> event handlers in HTML. However, if this data could get long-winded, consider allowing that property to be defined by a child element (or other element, by way of reference) as an option to maintain good readability.

The trouble, really, is that the decision criteria I've given above is often not be immediately clear, especially as requirements evolve. That's really the best argument I can give you for aggressive versioning, and robust handling of incomplete/extra information in the XML file.

throw table_exception("(? ???)? ? ???");

As for the verbosity of XML, it is fairly simple to use zlib to compress the XML. In the case of network transfer, is may be quite worth to spend the extra CPU-cycles and send a XMLZ-file.

You then have the tooling of XML in a fairly (albeit not the most) compact format.

Remember the various security pitfalls of parsing XML from an untrusted source. If you're not doing that, then you're fine though.

As far as I can tell, many systems are moving towards json for config etc. I can't speak for the rest, but I personally switched from xml (or xml-plists actually) to json because editing (and finding errors) is so much easier in json. Not to mention that json can be much more compact while still retaining readability.

My vote's for json.

This topic is closed to new replies.

Advertisement