binary save, adding vars, and changing savegame file formats

Started by
28 comments, last by Norman Barrows 11 years ago

One of the problems i run into building games, that I've never found a satisfactory solution for is the problem of getting save game to run fast by using binary files, adding variables to structs which are saved, the resulting file format changes, and its impact on saved playtest games.

long term play testing is done concurrently with development. many features added require additional variables that need to be loaded and saved. each time you add a variable, the save game file format changes, and you have to convert your save game files, or use a game editor to edit a new game up to where you were, or add a routine you run once that "edits up" a new game after it starts.

so at first, i'll use text files for save games. as variables are added, you add them to savegame() and init them in loadgame(). then load and save your saved games once, and all your files are converted to the new format. then you change the code to load them instead of init them in load game, and your up and running with new variables in your saved playtest games.

all fine and good til the file size gets big, and your "quick save" is pushing on 10 seconds, almost time enough to grab a beer right quick. and definitely too much time if your doing it every 30 seconds or so, cause you die a lot. you spend a lot of your testing time waiting for saves.

so then you go binary, and your 10+ second save is down to 2-3 seconds. quite reasonable for capturing the entire state of a simulation software program. but as soon as you add a new variable to a struct that already gets saved, your file format changes. then you need old and new versions of the struct declaration, and the load game and save game routines to convert the saved playtest games.

this often drives me to reuse or double up on the usage of existing variables that already get saved, to avoid converting save games. however, i consider this poor design. while doubling up on variable usage does speed up loads and saves, and avoids file format changes (whic is the reason i do it), it must be done with care to ensure the two usages of the variable do not overlap, and requires good commenting to make it clear whats going on. this leads to readability and maintainability issues, hence poor design.

another thing i've tried is adding filler fields to the end of struct declarations such as: unsigned filler[100]; // 100 bytes of filler

then when you add a new variable, you remove some filler, and the file format doesn't change. the problem with this is that adding enough filler to ensure room to grow slows down the binary save to almost the speed of the text file save.

another thing i've considered is just declaring structs with a few obvious variables, and then something like a data[10] array to hold whatever else is needed. as vars are needed, data[n] is assigned that duty.

here's the challenge:

you want a way to save and load data so that you can add a new variable to the file format and easily convert all old format files to the new format, with less hassle than the usual when changing binary file formats (IE old and new format declarations and versions of load and save). i found that with text files i could just add new variables to the end of load and save. you init a bank game to default values, then load. any variables not in the file because its an old format simply don't get loaded and the default values already set are used. i guess i'm looking for similar functionality at binary speeds.

does anyone have any suggestions?

clever database tricks was never my strong suit.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Advertisement
For structs, there are generally two problems:

- You add a new field, but the old file doesn't have it.
- You remove a field, but the old file still has it.

There are two main types of implementations I've seen:

=== Option 1 ===

Keep a version number for each struct, serialize the version number (either per struct or once for the entire save file), then write a deserializer which can handle any version up to now. The main benefit here is that you can fread the entire struct if its version number matches your code's struct. If not, you fall back to a generalized reader which if/else reads each individual field based on whether it existed at that version or not (including skipping fields that you've removed). The idea is that it only uses the slow deserializer when the app gets patched, and then when you overwrite the save file, it will be blazing fast again.

Terraria uses the if/else approach, but most of its loading time is spent rebuilding data that isn't actually saved to the file.

The code can look like:

if (version == kCurrentVersion)
{
  ReadEntireStruct(); // fread in C, other languages may or may not have a fast equivalent.
}
else
{
  field1 = ReadString();

  if (version > 12)
    field2 = ReadInt32();

  if (version > 13) // Needed to increase bitfield size to 32 bits from 16
    field3 = ReadInt32();
  else
    field3 = ReadInt16();
}
=== OPTION 2 ===

Each field keeps a 'field id', and the serialized struct stores that field ID before the field contents. In addition, if you want the ability to deserialize "unknown" (removed) fields, you need to add more data that indicates how long (or what type) the field is.

Google's protobuf uses this approach.

The code here can be written in various ways. For example, protobuf.net can generate deserialization classes on the fly and compile them with JIT.

while (fieldInfo = ReadFieldId())
{
  switch (fieldInfo.id)
  {
    case 1: field1 = ReadString(); break;
    case 2: field2 = ReadInt32(); break;
    case 3: field3 = ReadInt16(); break;
    case 4: field3 = ReadInt32(); break;   // notice the field id also had to change when changing the field's type.

    default: SkipField(); break;
  }
}
It's a basic serialization problem. Google can find a few bajillion web pages on serialization methods.


The solution I prefer is to store objects in the format:

{size, id, version, key1, a, key2, b, key3, c ... }

Read and write values by keys.

When serializing an object give it a writer object that accepts keys and values. When you get that back, encode the size, version, and then the table of values.

When deserializing the object it reads the next /size/ bytes, the ID so it knows what to write, the version in case you need to completely invalidate the values with a different structure version, followed by the table of values.

You then can have a family of functions:
bool writer.WriteBool ( u32 uniqueKey, bool value) {}
bool writer.WriteInt16( u32 uniqueKey, s16 value) {}
bool writer.WriteUInt16( u32 uniqueKey, u16 value) {}
bool writer.WriteInt32( u32 uniqueKey, s32 value) {}
bool writer.WriteString( u32 uniqueKey, const string& value) {}
....

And when reading them in,
bool reader.ReadBool( u32 uniqueKey, bool& value, bool default) {}
bool reader.ReadInt32( u32 uniqueKey, s32& value, s32 default){}
bool reader.ReadString( u32 uniqueKey, string& value, const string& value) {}
...

If a value wasn't written in the last save version, or if the version doesn't match, you can replace it with your own expected default.

The solution I prefer is to store objects in the format:

{size, id, version, key1, a, key2, b, key3, c ... }

How do you handle the case where an unknown key occurs? It seems like you have to abandon the remainder of the struct at that point because you don't have info to skip individual fields.

Do your keys include some bits that identify the field's size?

The solution I prefer is to store objects in the format:{size, id, version, key1, a, key2, b, key3, c ... }

How do you handle the case where an unknown key occurs? It seems like you have to abandon the remainder of the struct at that point because you don't have info to skip individual fields.Do your keys include some bits that identify the field's size?
As stated in the first post, when loading a value there is a default value parameter if the key/value pair does not exist.

So perhaps:

reader.ReadUInt32( kColorField, &mystruct.Color, kDefaultColor);

If the field exists then mystruct.Color is loaded with the rgba value that was persisted. If the field does not exist, then mystruct.Color is set to a default value.

For structs, there are generally two problems:

- You add a new field, but the old file doesn't have it.
- You remove a field, but the old file still has it.

yep, that's it in a nutshell. actually, its case #1 that's my big problem. if i stop using a var, i just leave it in the save game file. when i need a new var, i already have a slot ready to go (assuming sizeof() is the same). anything unused gets stripped out at the end in the final format.

Keep a version number for each struct

this is the hassle i'm trying to avoid.

granted, it not that much to copy and paste a struct, call it struct2, add a field, copy and paste load and save game, call them loadgame2 and savegame2, and change them to use struct2's, but after a while you get a lot of struct declarations. since i try to keep all in house saved play test games updated as the file format changes, i usually only have 2 version of a struct going at once. and all that copy and pasting is much more work than adding writefileint(f,newvar); and newvar=readfileint(f);

Each field keeps a 'field id', and the serialized struct stores that field ID before the field contents.

saving more data - slows things down - same problem as adding filler. i'm already using fwrite_nolock to get 3 second save times. plus you cant get that nice big write of an entire array at once. and its probably as much work to code as keeping multiple versions of structs. its would be nice to have the speed of binary and the ease of coding of text.

i did consider something along these lines. instead of saving an entire struct (or array of structs) as a binary chunk of data, you save the individual fields one at a time as binary. no tags, flags, counts, etc. just read and write in the ordered declared. when you add a var, you add it to the end of the load and save routine. it does require looping through the array again to write the new field, but you're not writing any extra data, and your still at fast binary speed. but many small writes vs a few big ones may still be slow. on the load side, you test for EOF and only attempt a read of a new var if not EOF.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

I think it's not worth writing whole structs unformatted, even in a binary file. It's totally inflexible and doesn't gain much.

Many small writes is not a problem if you buffer the data, and the processing time to write individual fields is negligible.

For versioning just store a version number in the file and write loading code that can handle old versions.
Didn't notice your comment about how long it takes you:

>> "quick save" is pushing on 10 seconds

Even an inexpensive 5200RPM HDD can handle around 100 MB/sec bursts. I really don't think the size of your save data is the real issue. Unless you are trying to save out some massive 600+ MB data files, or you are writing to a relatively slow SD card or USB storage, you likely have a bottleneck somewhere else.

You could test this by commenting out the code that actually writes to files. Profile your app and figure out where the real bottlenecks are taking place.

Over the years I've seen quite a few things that can slow a persistence system down: Using the wrong file I/O functions, or using the correct functions poorly, such as writing out singe bytes at a time instead of writing out larger blocks. Using fancy binary packers to attempt to save space (which sometimes can make sense if you have tightly enforced file sizes, such as on a game console). Using improper buffering techniques, such as a std::vector that did not have reserved space and is forced to resize itself many times over its lifetime. Poor navigation of the tree of objects to be saved. Etc, etc.


You cannot just blame the size of an individual struct as the root cause of slow writes. With ten seconds you should easily be able to get a gigabyte or more of raw output.

Many small writes is not a problem if you buffer the data

so if writing one field at a time in binary is too slow, write 2048 bytes to a buffer, then fwrite() the buffer, eh?

Didn't notice your comment about how long it takes you:

>> "quick save" is pushing on 10 seconds

that was the speed of the text mode save i was using at first. its was easier to add new vars to text load and save. but when i got to 10 seconds, i switched to binary, which was the plan all along. that 10 second text save dropped to 2-3 seconds with binary fwrite_nolock.

Unless you are trying to save out some massive 600+ MB data files, or you are writing to a relatively slow SD card or USB storage, you likely have a bottleneck somewhere else.

savegame backs up the last save, then overwrites the last save game, then saves the 10 local maps currently cached in memory.

a savegame file is 57 meg. the back up is also 57 meg (obviously). each local map is 273K. grand total: 116.73 meg.

i may need to speed up the copy routine used for the backup. it takes about as long as a save. the local maps take practically no time by comparison.

without the automatic backup, save times would be on the order of 1-2 seconds.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Many small writes is not a problem if you buffer the data

so if writing one field at a time in binary is too slow, write 2048 bytes to a buffer, then fwrite() the buffer, eh?

Yes. Always buffer your I/O. Doing it one item at a time you will be killed with the overhead of millions of unnecessary function calls. Don't do one thing at a time when you can do millions of things at the same time for the same effort.

Didn't notice your comment about how long it takes you:

>> "quick save" is pushing on 10 seconds

that was the speed of the text mode save i was using at first. its was easier to add new vars to text load and save. but when i got to 10 seconds, i switched to binary, which was the plan all along. that 10 second text save dropped to 2-3 seconds with binary fwrite_nolock.

It should not have taken ten seconds.

Our save game system generates about 50MB of data, and writes it out in about a half second, with another half second or so to figure out exactly what to save. The bottleneck is not writing the data.

Unless you are trying to save out some massive 600+ MB data files, or you are writing to a relatively slow SD card or USB storage, you likely have a bottleneck somewhere else.

savegame backs up the last save, then overwrites the last save game, then saves the 10 local maps currently cached in memory.

a savegame file is 57 meg. the back up is also 57 meg (obviously). each local map is 273K. grand total: 116.73 meg.

i may need to speed up the copy routine used for the backup. it takes about as long as a save. the local maps take practically no time by comparison.

without the automatic backup, save times would be on the order of 1-2 seconds.

Again, with the PC game I'm currently working on we write about 50MB in a half second. The bottleneck is not in writing the data to disk; our performance hotspot is pruning the tree of what to include in the save data and what to leave behind.

If your write time really is too large you can buffer the entire save file in memory (100MB is cheap these days with virtual memory) and then start an asynchronous write. Then the only time required is the time to navigate your object tree and actually serialize the data into a buffer.

This topic is closed to new replies.

Advertisement