• Create Account

## Binary vs Text file formats

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

14 replies to this topic

### #1PlayfulPuppy  Members

Posted 09 September 2006 - 08:03 PM

### #2Promit  Senior Moderators

Posted 09 September 2006 - 08:06 PM

Well, text based formats are certainly way, way better for debugging, hand edits, etc. But at runtime or at least for production code, binary formats should be in use. So the ideal case is that your content build compiles the text formats into a binary format and packages that. Your game then loads the binary format. (Optionally your game can load the text format directly if necessary.)

Since most people (most hobbyists, anyway) don't bother with proper content build pipelines, they simply lack the step that converts to a binary representation and simply stick to text. That's my theory, anyway.

Binary formats rock at runtime.

### #3T1Oracle  Members

Posted 09 September 2006 - 09:05 PM

Quote:
 Original post by PromitWell, text based formats are certainly way, way better for debugging, hand edits, etc. But at runtime or at least for production code, binary formats should be in use. So the ideal case is that your content build compiles the text formats into a binary format and packages that. Your game then loads the binary format.

Agreed, all text formats should be able to be compiled into a binary version. I personally see no benefit to XML unless you need to share text data with outside sources. Elsewise a script (or even ini file) format can easily be made into a superior alternative.

Text for debugging, binary for release.
Programming since 1995.

### #4Extrarius  Members

Posted 09 September 2006 - 09:20 PM

Personally, I think the move to XML is a mistake, because XML is _way_ too verbose. If you want a text-based format, use something based on S-Expressions (used by Lisp). It's just as flexable, but it doesn't require repeating everything for an end tag, and if you make your scripting language also use S-Expressions, all your data can be stored as regular scripts (that possibly have access to different native functions).

Personally, I use binary formats for most things simply because they're easier to create with existing editors for images, models, levels, etc.

As far as community friendly, editing files manually is not at all community friendly, whatever information they store. You really would be better served by creating editors for things that don't already have editors.

For extensibility, binary formats can be just as extensible by using some kind of 'tagged format' somewhat like the TIFF image format or WAV audio format.

### #5ToohrVyk  Members

Posted 09 September 2006 - 09:29 PM

Quote:
 Original post by PromitSo the ideal case is that your content build compiles the text formats into a binary format and packages that. Your game then loads the binary format. (Optionally your game can load the text format directly if necessary.)

A common improvement of that scheme is to load both text and binary formats (check for binary format first, then for text format), and to have the game output the binary equivalent of a text resource upon loading.

This has the advantage of allowing the game to output binary content highly compatible with its internal representation without code duplicates in an external compiler, and also allowing you to bundle the game with smaller, compressed text files that are turned into bulkier but faster-loading binary files when the game is first run (ideal when you want a small downloadable installer).

### #6Delfi  Members

Posted 09 September 2006 - 09:42 PM

Quote:
 - Accuracy: Floating-point numbers can be a real pain, but at least with a binary-based format you'll always get out the exact same value that was put in. With text-based formats, you'll probably have to limit the number of values in the mantissa so you don't have to read in 30 characters for a tiny value. Whilst the difference between 0.000000005 and 0.00000001 is pretty damned minimal, it can have an affect on objects with a paticularily small scale or where extreme accuracy is required (Very, very rare, but I have seen it happen).

please read what you wrote, it is totally wrong, you don't seem to understand how floating point numbers work! every number in memory can be reprisented with text description, but the 0.000000005 tolerance is NOT from conversion, it is what is actually stored in the data.

you just need to know when to use text files and when not to. if the file will be updated and tweaked often use text, if it is 3d model or texture use binary, it is simple as that. and try not to use xml, because it is overkill even for what it was originally intended.

Projects: Top Down City: http://mathpudding.com/

### #7Xetick  Members

Posted 09 September 2006 - 09:46 PM

As long as you only use attributes in an element you dont need an end tag so you can make it quite small actually. I'm doing this for my own project file format and find it as compact as you can get it without removing the self descripting that follows it.
<Nodes><Node Type="Screen" Name="Screen1" Viewport="0 0 1 1" /><Node Type="RenderObject" Name="RenderObject1" Col="1 1 1 1" Pos="0 0 0" Rot="0 0 0" Scale="1 1 1" CamPos="0 0 -9" CamRot="0 0 0" CamFov="60" ScaleByAspect="false" /></Nodes>

It's still a lot more than a binary format of course. What I would like to see is a binary format of XML so you could use the exact same interface to you xml file just exchange the file. I think they are working on that. Not sure how it's going though.

### #8ronnybrendel  Members

Posted 09 September 2006 - 10:07 PM

in most cases i think, it´s desirable to sacrifice some performance in order to debugg better

although its only reading and writing performance which is lost ... if you want you can make cachefiles out of your xml data, in order to load them faster the next time?

as for the numbers: saving in hex make it pretty fast to load into a programm (at least faster than with base 10)

and if HD-Speed is your bottleneck, try zlib

### #9Joakim_ar  Members

Posted 09 September 2006 - 10:16 PM

What I don't like about XML is it's verbosity and that it has too many ways of doing the same thing (attributes are not really needed, they're just sugar, complicating the syntax). Most data that you need to store in a game is structured as a tree, which XML can represent, but in a very verbose way. A much simpler language for expressing tree structures can easily be thought up, such as the following:
nodes(    node(        type:Screen;        name:Screen1;        viewport(            from(x:0; y:0;)            to(x:1; y:1;)        )    ))
Ie. two types of node: one that can contain other nodes (the ones with paranthesis) and one that can contain a string (the ones with the colon/semicolon). The result is much less syntactic noise, less (one) special characters to 'escape', making it easier to read, write and parse. The only functionality that is lost is the ability to do text markup easily, such as specifying that a range of text should be bold (since strings cannot contain nodes descriping this).

ronnybrendel: Saving floating point values in hexadecimal kinda defeats the purpose of XML being easy to read. And besides I don't think the processing time required to parse a number means anything compared to the disk access.

### #10ronnybrendel  Members

Posted 09 September 2006 - 10:30 PM

Quote:
 Original post by Joakim_arronnybrendel: Saving floating point values in hexadecimal kinda defeats the purpose of XML being easy to read. And besides I don't think the processing time required to parse a number means anything compared to the disk access.

your right. for integers it is a bit faster, and it requires less space in most cases

but you´re flexible with the size of your number 1-8 bytes for 32bit e.g. ... i mean it could eat less memory than binary (so you can say that loading a fully charged binary integer maybe as slow/fast as loading a hexvalue --- depending on the number itself) - and you can read it with your own eyes ;)

### #11Extrarius  Members

Posted 09 September 2006 - 10:34 PM

Quote:
 Original post by XetickAs long as you only use attributes in an element you dont need an end tag so you can make it quite small actually. I'm doing this for my own project file format and find it as compact as you can get it without removing the self descripting that follows it.[...]
You entirely lose the ability to nest elements, though, which you don't do if you use s-expression format:
(Root  :Type "Dialog"  :SpeechFiles "/speech/bob/chapter2/"  (Dialog    :Name "Bob"    (Statement      :File "Hello"      :Text "Hello. How are you?"    )    (Choice      :FileSet "HelloResponse"      :TimeLimit 10      :Default :Neutral      '(        :Good "Good. How are you?"        :Neutral "Fine."        :Evil "Shut Up!"      )    )  ))
and if you're using Lisp, Code Is Data Is Code, so you only have to write one scripting system.
Quote:
 Original post by Delfi[...]please read what you wrote, it is totally wrong, you don't seem to understand how floating point numbers work! every number in memory can be reprisented with text description, but the 0.000000005 tolerance is NOT from conversion, it is what is actually stored in the data.[...]
Don't forget that converting text to binary involves a lot of math (for floating point numbers), and thus without a _LOT_ of effort, you will actually get less precision than is actually available in your floating point format.

### #12Kambiz  Members

Posted 09 September 2006 - 10:42 PM

POV-Ray : Documentation : 1.4.4.7 Why are triangle meshes in ASCII format?

"...meshes use floating point numbers.
It might come as a bit of surprise that it is far from easy to represent them in binary format so that they can be read in every possible system...
...In order to store floating point numbers so that they can be read in any system, you have to store them in an universal format. ASCII is as good as any other."

### #13PlayfulPuppy  Members

Posted 09 September 2006 - 11:09 PM

Quote:
 Original post by ExtrariusAs far as community friendly, editing files manually is not at all community friendly, whatever information they store. You really would be better served by creating editors for things that don't already have editors.

Well, what I should have pointed out is that it's a lazy form of community friendly. It doesn't really require any effort from programmers or marketing, whilst putting out a toolset post-release (Usually) requires some additional clean-up to be done by the programmers, as internal-only dev tools are generally raw, ugly and error-prone. Then it normally needs to have a look-over by the legal department to make sure nothing incriminating is released with the dev-tools (How many of you, even breifly for testing, have had and error dialog or log entry that contained profanity?).

It's more effort than most developers are willing to spend/publishers are willing to give, so most games either don't have toolsets released at all, or only have shitty little half-working plugins released. Text formats allow the community to take a stab at writing their own tools, which will probably get far more support than dev-released tools would.

(I'm not saying I agree with this practice, it's just the way it is a great deal of the time)

Quote:
 For extensibility, binary formats can be just as extensible by using some kind of 'tagged format' somewhat like the TIFF image format or WAV audio format.

For sure, as a matter of fact one of the primary goals of my (Binary) file format is to be extensible without breaking the format (My previous format was really getting on my nerves as I had to modify the exporter, synchronise the loader and re-export ALL the assets whenever I wanted to add a tiny new feature).

### #14Xai  Members

Posted 09 September 2006 - 11:23 PM

Also, there is a big difference between 100% internal data that you do not intend users, modders, or 3rd party tools to read or manipulate and data that might be customizable, etc.

And in favor of XML, for data that is editable by external, tool or human, sources. ONE benifit is their readablitity, which can be thought of as a small amount of self-description .. although it is also a minus, since the words can be poorly chosen and confuse or decieve the user. The real benifit comes when you create a DTD / SCHEMA. In this way you can tell the world in a 100% agreed upon manner a very fundamental description of your datas organization and rules. Sure not everything that passes the schema is valid data (because schema hold almost no business rules), but everything which fails the schema is invalid data. So its a good starting poing. And your modding / tools can perform schema validation when loading or reading files, to let you know when you have broken anything.

Such checks can also be performed at game load, and in the case of my config files I embed a valid "default" file in the executable to be used in cases of invalid custom files. Oviously this is for small files you have few of, not large bodies of assests, which just have to be right or the program can't do whatever relies on them ... but that's the case no matter what you use.

### #15ApochPiQ  Moderators

Posted 09 September 2006 - 11:57 PM

This question should not be answered with "always" or "never" type responses. There are cases where text-based formats (even XML) are extremely advantageous, and cases where binary is simply the only acceptable medium.

The equivalent-binary scheme that Promit mentioned is in very widespread use, and pretty much solves all the issues involved. An even simpler method is to store all the data in a purely binary format, and create an editor tool that expands the binary data to a textual equivalent representation. This allows trivial hand-editing of data, without introducing a potential point of failure in the content build pipeline, and without requiring duplicated code (i.e. game code has to be able to read the textual format). We're currently migrating to that approach.

There's a nice heuristic question that can be used to determine whether to take this route, or to use a format like XML: does an editor for this type of data already exist? I don't think this is by any means a hard-and-fast rule, but it'll handle the majority of the cases. (Note that I personally find sexprs to be more elegant than XML, but infinitely less practical, due to the lack of schema and editor support.)

Consider a real-world case: 3D models have plenty of existing editing tools. All of the decent modelling packages have plugin infrastructures that support custom file-format exporters. Therefore, using XML is not really useful here, since you can export from the editor directly to the needed binary format. (This gets subtly more complicated when including portability formats like Collada, but that's beyond the scope of discussion I think.)

On the flip side, consider a data format that defines animated cutscene/movie sequences to be rendered in-engine. There is no editor for such things, at least not one that is guaranteed to suit our requirements. This is where XML becomes an attractive option: XML parsing libraries are ubiquitous, so the cost of integration with the game code is going to be minimal; once the tree is loaded in-memory it can be trivially converted to any equivalent tree structure desired, meaning that there's no reason to worry about the XML format affecting the rest of your code architecture if you don't want it to; and a good schema combined with a good XML editor (like VS2005) gives you a validation tool and file format documentation all in the same place.

XML in this case truly helps eliminate duplication of information to a huge extent: since schemas can have embedded human-readable documentation, you don't need separate doc files; schemas instantly eliminate the need for sophisticated validation, and in the worst case additional runtime validation on the tree (post-loading) is trivial; and you've totally removed the need for a custom editor tool without precluding the introduction of one at a later date.

Use the right tool for the job; don't mangle the job so you can do it with one particular tool.
Wielder of the Sacred Wands

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.