Binary XML format?

Started by
4 comments, last by gimp 22 years, 9 months ago
I havent found any data on a ~Binary XML format~ so I''ve decided to create my own. What I want to do is have all the XML goodness except it''s in binary. Some thing''s I''m planning on is using elements in a similar way but rather than have end tags have lengths of each data section. Rather than ascii start tags I''d use a 4 byte unique ID(maybe in ascii). I can imagine how all this would work except, how would I embed elements? The problem I have is in recognising where an embedded element is and where data is (that might happen to have the same binary image. Anyone know much about this kind of thing? Does a standard binary format exist? I sortof got this idea from lightwave object files but their format is strict compared with XML. The other problem is how to treat attributes. The best I can come up with is treating them like Elements? Thoughts? Chris Brodie http:\\fourth.flipcode.com
Chris Brodie
Advertisement
Why would you want to make a binary version of XML anyway? Most people only use binary files for a specific application, like objects in a modeller, or images, or whatever. The idea of XML is that it''s easily extended (that''s what the X stands for!) and portable. That means you don''t have to worry about byte ordering (which is a problem with binary files, some computers store data with the most significant bit first, and some with the least significant bit first. You''d need to convert between the two when sending data between different computers). And how would you make a binary format extensible? How do you add new tags or properties?

If you want a binary format, then you can write your own conversion routine to convert XML to your own internal binary format, after all the idea of binary formats is that they make loading fast and produce compact files. If you''re mimicking XML with your binary files, then loading won''t be much faster, the only thing you''ll get is a smaller file. Remember that text files compress really well with general compression techinques, so you could just take an XML file, and compress it with winzip if you''re looking for smaller files.

War Worlds - A 3D Real-Time Strategy game in development.
Why create a flexible file format?

Just say I create a file format to hold some verts, uv, normals, tris etc. Then a few months later I want to use a stipifier. So I want to starting adding indices for strips to the format... without breaking the old format.

I''m already using factories for tag recognition so a binary or ascii file parser to mer are essentially the same except the data in the binary is more readily castable. (ie say a vert has a position or 0,0,0 in a ascii file, the first 0 will be the same as a null.)... so thats my reason for going binary.

I''m only deploying to little endian machines.

"The idea of XML is that it''s easily extended " Thats what I want as I can already do that, but embedding data hunks within each other is the problem. Unless the parser is expecting to find element ''tris'' in tag ''poly'' then ''poly'' is also a valid integer or float or whatever.

The problem seems to be how can a data hunk contain both embedded elements AND it''s own data...



Chris Brodie
http:\\fourth.flipcode.com
Chris Brodie
If you want a flexible file format, then what''s wrong with using XML? If you''re worried about reading XML files being too difficult, then don''t... you can download lots of libraries for doing to free (msxml is a good one, check out: http://msdn.microsoft.com/xml/default.asp). Using proper XML files also means you can edit them with a text editor.

Anyway, if your heart is set on a binary format, the way I would do it is to make it as much like the text version of XML as possible. Say you have a file that looks like this (stuff in < and > are replaced with a one or two byte tag)

<TAG_CHUNK_START>
<TAG_PROPERTY><property-id><type>value
<TAG_PROPERTY><property-id><type>value
<TAG_DATA><length><length bytes of data>
<TAG_PROPERTY><property-id><type>value
<TAG_CHUNK_START>
.. repeat here for sub-chunk stuff
<TAG_CHUNK_END>
<TAG_PROPERTY><property-id><type>value
<TAB_CHUNK_END>

properties might look like this:

<TAG_PROPERTY><PROP_ORIGIN><TYPE_VECTOR><xyz>

So you''d know that the size of PROP_ORIGIN property is 3 * sizeof(float).

Anyway, that''s how I''d do it. (Though I still recommend you just use XML)

War Worlds - A 3D Real-Time Strategy game in development.
There is a standard called Tagged File Format or Interchange File Format. It was originally defined by Electronic Arts. The specification is here.

EA IFF 85 Standard for Interchange Format Files
(text version)

http://www.wotsit.org/download.asp?f=iffhttp://www.concentric.net/~Bradds/iff.html


Here is how real audio used it.
http://globecom.net/ietf/draft/draft-heftagaub-rmff-00.html

Here is how it is used for wav audio
http://sharkysoft.com/software/lava/rocks/jwave/docs/javadocs/lava/riff/wave/doc-files/riffwave-content.htm

More stuff
http://www.udayton.edu/~cps/faculty/jloomis/cps592B/asgn/asgn1/riff.html

Humm... thanks guys plenty to think about now...

Chris Brodie
http:\\fourth.flipcode.com
Chris Brodie

This topic is closed to new replies.

Advertisement