Working with changing file formats

Started by
4 comments, last by UnshavenBastard 15 years, 7 months ago
Hey there, first of all, I'm not 100% sure whether this is the best place to put this thread, I guess I'll know it when the things suddenly gets relocated ^^ My question is, are there proven "best practices" regarding the working with file formats that are subject to change? Let me elaborate. Suppose I have something like editors for game content. Due to very tight time constraints of a project, the editors have to be used already to create content, while some more or less subtleties of the game concept might change, and might force changes to the (XML based) file format that describes the game content. Then there's the content that has already been created, using the old file formats. Seems like a nasty problem to me, but maybe there are already solutions to it. (or maybe not, and the answer is "make a better schedule next time!" hah. duh.) Well, thanks in advance for helpful input. - unshaven
Advertisement
Quote:Original post by UnshavenBastard
create content, while some more or less subtleties of the game concept might change, and might force changes to the (XML based) file format that describes the game content.
Needless to say that this shouldn't happen. However, if your game's library supports not only reading state, but also writing state, it should not be such a terrible problem.
When it happens, read in the XML file, which will initialize data/state/whatever. Fields that aren't contained in the old format are initialized to default values. Then save your state, which will generate some hopefully valid XML files in your "new" format. Lastly, just to be on the safe side, have someone look over it again.

Alternatively, you could probably use XSLT to transform "old" to "new", but writing an explicit converter is probably more error-prone than simply using the logic that already works reliably. Also, if you don't know XSLT already, it's something new you'd have to learn.
No I'm not familiar with XSLT.

Yeah, just setting default values for things that aren't there could be acceptable.
I was thinking of doing an explicit converter "by hand", just using the old loading module, and then saving to the new, yeah, with default values.

But just generally rather "trying" to load stuff than demanding its presence might be even better,
at least if some temporary switch is enabled that says "tryLoadOldFileFormat" or something... to not generally prevent bugs from showing there.

Then either the user could have to explicitly state that he wants to import old files, or, if the loader encounters old files, it notifies the user that it's an old format and some values are set to defaults and they might check these whether they make sense in the specific contexts.

Thanks =)
put a version number on it. when the new editor loads an old version it uses the old code to load it. Default out new values, discard unused values.
Yeah, the version number thing is there from the beginning, since I "smelled" this could be a problem ^^

As for my thougts of perhaps letting the user explicitly and consciously load old files, I thought this might be good since the user knows very clearly that it's an old version he is dealing with. Like, he has to use "import old project file" or something, instead of regular "open project",
so he is not just clicking away a nagging screen saying "this version was old", he's more aware of the fact he has to edit some values. Maybe... ;)
For one-off type stuff we use the Xml attributes built into .NET and just serialize the stuff out and put reasonable defaults in for new stuff.

For long-term files, this is our (complex) Xml file strategy developed over some time:

) We use LiquidXml Studio to edit our Xml schemas.

) We only have 1 root element in all of our schemas with a common header (which includes our Xml schema version) and then a choice that has each of our file-types in it. This is important due to the brain-dead way xsd.exe works - it only generates 1 .cs file, not 1 .cs per schema. So to get reuse in the classes, avoid name collisions etc..., you have to generate all the code from 1 schema.

) We have 1 assembly responsible for reading and writing all of our Xml files.
) All schemas are included into the assembly as an 'embedded resource'.

) We have a custom tool for MSVC that uses the xsd.exe program and generates /class code. The built-in custom tool for schemas generates heinous dataset code - avoid it. (If you google for xsd custom tool classes and you should find the source code we based our from). This custom tool is associated with the root schema that has the header & choice element.

) Minor schema changes are a minor version bump
) Breaking schema changes are a major version nump
) Each major schema version has its own namespace/folder/generated code/etc... we never throw-away the old ones.
) We have a GetXmlSchemaVersion function that open the file as text and quickly retrieves the schema version (regex's the first few lines)
) A custom XmlUrlResolver (called XmlResourceResolver) retrieves the embedded schemas from the assembly - references look like res://Xml.dll/Xml1/Config.xsd
) Based on the version retrieved we go get the de/serializer class for the correct version and suck in the file.
) Using partial class we extend the root generated Xml class to have static functions to write out specific file-types and 1 deserializer to bring them all back in (using a static XmlSerializer).

We now have a self-contained assembly that contains our history of schema-generated code and can read & write all versions of our Xml.

That would be enough if you never made a mistake in your Xml schemas and never made a mistake writing data in the Xml classes. e.g. we have a time field that is supposed to be in seconds but some data was written into it in nanoseconds.
The original code that wrote out the Xml files did not have its XmlSerializerNamespaces set correctly so it wrote out whacked-out Xml with tons of q1, q2, ... q153 etc... namespaces (ugly Xml but not bad) but also put some attributes & elements in a redundant namespace (fails validation once you fix your code).

When reading in Xml data it goes through 2 extra steps...

) PreParseFixUp
Something like bad namespaces must be fixed pre-parse; so the file is read into a memory-stream then broken into lines (string[]) and the PreParseFixUp checks the actual data-type and the schemas version and then fixes stuff (if it needs to) and returns a Stream.

) Parse
Now it's parsed into a generated Xml class. If stuff has been fixed, the Xml is parsed from the Stream if not it just uses the original data.

) PostParseFixUp
It then goes off to PostParseFixUp where something like times in nanoseconds are converted into seconds. (again double-switched based on file-type and then schema version)

) The scrubbed Xml class is now returned to the application.

) In the application there is always a class in the object model responsible for actually holding the data in a meaningful way. It must be able to suck-in all previous Xml version but only needs to (only should) write-out the latest version. i.e. don't use the generated Xml classes directly in your code - go through an object model first.

Finally, we have forward-compatibility for all our data.
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
hey, thanks for your reply.

that sounds interesting. I'll read/think about that more thoroughly when I got more time, it sounds a lot more solid than what I'm doing ATM anyways ^^

cheers

This topic is closed to new replies.

Advertisement