Binary vs Text file formats

Started by
13 comments, last by ApochPiQ 17 years, 7 months ago
Quote:Original post by Xetick
As long as you only use attributes in an element you dont need an end tag so you can make it quite small actually. I'm doing this for my own project file format and find it as compact as you can get it without removing the self descripting that follows it.[...]
You entirely lose the ability to nest elements, though, which you don't do if you use s-expression format:
(Root  :Type "Dialog"  :SpeechFiles "/speech/bob/chapter2/"  (Dialog    :Name "Bob"    (Statement      :File "Hello"      :Text "Hello. How are you?"    )    (Choice      :FileSet "HelloResponse"      :TimeLimit 10      :Default :Neutral      '(        :Good "Good. How are you?"        :Neutral "Fine."        :Evil "Shut Up!"      )    )  ))
and if you're using Lisp, Code Is Data Is Code, so you only have to write one scripting system.
Quote:Original post by Delfi
[...]please read what you wrote, it is totally wrong, you don't seem to understand how floating point numbers work! every number in memory can be reprisented with text description, but the 0.000000005 tolerance is NOT from conversion, it is what is actually stored in the data.[...]
Don't forget that converting text to binary involves a lot of math (for floating point numbers), and thus without a _LOT_ of effort, you will actually get less precision than is actually available in your floating point format.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Advertisement
Read this:
POV-Ray : Documentation : 1.4.4.7 Why are triangle meshes in ASCII format?

"...meshes use floating point numbers.
It might come as a bit of surprise that it is far from easy to represent them in binary format so that they can be read in every possible system...
...In order to store floating point numbers so that they can be read in any system, you have to store them in an universal format. ASCII is as good as any other."
Quote:Original post by Extrarius
As far as community friendly, editing files manually is not at all community friendly, whatever information they store. You really would be better served by creating editors for things that don't already have editors.


Well, what I should have pointed out is that it's a lazy form of community friendly. It doesn't really require any effort from programmers or marketing, whilst putting out a toolset post-release (Usually) requires some additional clean-up to be done by the programmers, as internal-only dev tools are generally raw, ugly and error-prone. Then it normally needs to have a look-over by the legal department to make sure nothing incriminating is released with the dev-tools (How many of you, even breifly for testing, have had and error dialog or log entry that contained profanity?).

It's more effort than most developers are willing to spend/publishers are willing to give, so most games either don't have toolsets released at all, or only have shitty little half-working plugins released. Text formats allow the community to take a stab at writing their own tools, which will probably get far more support than dev-released tools would.

(I'm not saying I agree with this practice, it's just the way it is a great deal of the time)

Quote:
For extensibility, binary formats can be just as extensible by using some kind of 'tagged format' somewhat like the TIFF image format or WAV audio format.


For sure, as a matter of fact one of the primary goals of my (Binary) file format is to be extensible without breaking the format (My previous format was really getting on my nerves as I had to modify the exporter, synchronise the loader and re-export ALL the assets whenever I wanted to add a tiny new feature).
Also, there is a big difference between 100% internal data that you do not intend users, modders, or 3rd party tools to read or manipulate and data that might be customizable, etc.

And in favor of XML, for data that is editable by external, tool or human, sources. ONE benifit is their readablitity, which can be thought of as a small amount of self-description .. although it is also a minus, since the words can be poorly chosen and confuse or decieve the user. The real benifit comes when you create a DTD / SCHEMA. In this way you can tell the world in a 100% agreed upon manner a very fundamental description of your datas organization and rules. Sure not everything that passes the schema is valid data (because schema hold almost no business rules), but everything which fails the schema is invalid data. So its a good starting poing. And your modding / tools can perform schema validation when loading or reading files, to let you know when you have broken anything.

Such checks can also be performed at game load, and in the case of my config files I embed a valid "default" file in the executable to be used in cases of invalid custom files. Oviously this is for small files you have few of, not large bodies of assests, which just have to be right or the program can't do whatever relies on them ... but that's the case no matter what you use.
This question should not be answered with "always" or "never" type responses. There are cases where text-based formats (even XML) are extremely advantageous, and cases where binary is simply the only acceptable medium.

The equivalent-binary scheme that Promit mentioned is in very widespread use, and pretty much solves all the issues involved. An even simpler method is to store all the data in a purely binary format, and create an editor tool that expands the binary data to a textual equivalent representation. This allows trivial hand-editing of data, without introducing a potential point of failure in the content build pipeline, and without requiring duplicated code (i.e. game code has to be able to read the textual format). We're currently migrating to that approach.


There's a nice heuristic question that can be used to determine whether to take this route, or to use a format like XML: does an editor for this type of data already exist? I don't think this is by any means a hard-and-fast rule, but it'll handle the majority of the cases. (Note that I personally find sexprs to be more elegant than XML, but infinitely less practical, due to the lack of schema and editor support.)

Consider a real-world case: 3D models have plenty of existing editing tools. All of the decent modelling packages have plugin infrastructures that support custom file-format exporters. Therefore, using XML is not really useful here, since you can export from the editor directly to the needed binary format. (This gets subtly more complicated when including portability formats like Collada, but that's beyond the scope of discussion I think.)

On the flip side, consider a data format that defines animated cutscene/movie sequences to be rendered in-engine. There is no editor for such things, at least not one that is guaranteed to suit our requirements. This is where XML becomes an attractive option: XML parsing libraries are ubiquitous, so the cost of integration with the game code is going to be minimal; once the tree is loaded in-memory it can be trivially converted to any equivalent tree structure desired, meaning that there's no reason to worry about the XML format affecting the rest of your code architecture if you don't want it to; and a good schema combined with a good XML editor (like VS2005) gives you a validation tool and file format documentation all in the same place.

XML in this case truly helps eliminate duplication of information to a huge extent: since schemas can have embedded human-readable documentation, you don't need separate doc files; schemas instantly eliminate the need for sophisticated validation, and in the worst case additional runtime validation on the tree (post-loading) is trivial; and you've totally removed the need for a custom editor tool without precluding the introduction of one at a later date.


Use the right tool for the job; don't mangle the job so you can do it with one particular tool.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement