Serialization libraries in C

Started by
16 comments, last by Khatharr 7 years, 1 month ago

So I'm programming a simple game in C with SDL and I'm trying to make a game save system. From what I understand I need to serialize the data to store the game variables in files.

So I was wondering if anyone knew of some good libraries for this. The libraries I've found so far are libtpl and protobuf-c. libtpl seems like it hasn't been updated in a while, and I don't really like google (though if it's the best option I'm happy to use protobuf-c).

So are there better libraries out there, or should I use one of the aforementioned?

Advertisement

I would just use fprintf and fscanf and write a ToFile/FromFile function pair:


void ToFile( FILE* fp, data_t* data )
{
    fprintf( "%d ", data->a );
    fprintf( "%d ", data->b );
    fprintf( "%d ", data->c );
    fprintf( "%s\n", data->string );
}

void FromFile( FILE* fp, data_t* data )
{
    fscanf( "%d", &data->a );
    fscanf( "%d", &data->b );
    fscanf( "%d", &data->c );
    fscanf( "%.*s", MAX_CHARS, data->string );
}

you may want to add some separator between the numbers, so the reader knows when to skip to the next number :)

Thanks Randy. I didn't think to use that (I tend to overcomplicate).

Is this the most efficient method? The data is not overly complicated but there is and awful lot of it.

There are many definitions of "efficient". What was describes is both functional and common, although there are some slight modifications typically used.

It is one method. Many games use it, or something similar. They often use more advanced libraries that handle things like migrating from older data to newer data by saving a version number, or by saving key/value pairs and reading key/value pairs with a default. The libraries may include things like compression or encryption or error detection. There are often markers before each object indicating their size, and other format elements. Even so, the pattern described above is common.

There are other methods that may satisfy different definitions of "efficient", meaning requiring less clock time, or requiring less disk space.

If you are talking about many megabytes/gigabytes of data you'd want to follow the pattern but possibly use a different buffering than the standard IO package uses internally. There are ways to deal with larger packets of data so you have fewer function calls with less overhead. There are also OS-specific techniques you can use for slightly better clock time.

Short answer is yes, this method is efficient, but no not the "most efficient".

This method should work pretty well for most all use cases. A good optimization might be to start dumping values in binary instead of in text, but this is a little more difficult and probably not necessary for many projects. In that case fwrite can be used instead of fprintf. But care must be taken to handle endianness (for example always read and write little-endian values, and on big-endian machines perform additional run-time byte swaps).

Also some people like to write a tool that outputs serialization functions automatically, as in, a tool that writes the ToFile and FromFile functions code. There are all kinds of ways to implement this stuff, and it can get very complicated. Personally I stick to what I described above.

But care must be taken to handle endianness (for example always read and write little-endian values, and on big-endian machines perform additional run-time byte swaps).

Every machine you are likely to work with these days is little endian.

If you're using Solaris or AIX or HP-UX you might encounter big endian, but chances are against it. Big endian is effectively dead except for items that need to be in "network byte ordering", which is big endian for historical reasons. Even so, many game protocols don't bother with that and pass network data in little endian.

So unless you need to do save files that are compatible with some big-iron mainframe computers or some other extremely rare scenario, just ignore it because it isn't a real concern.

It always puzzles me that this is somehow becoming a lost art. Binary serialization in C is very straightforward, it simply requires a bit of planning and organization.
If you're wanting to create compact save files then you'll want a header, an optional data table, and a payload. The header can begin with a fourcc of your own choosing and can be used both to verify that the file is intended to represent the format indicated and that the endianness matches (otherwise the character order will mismatch predictably). You can create a fourcc in C as follows:


unsigned int fourcc = 'SAVF';

Following that you may wish to place a 16 bit major version number and a 16 bit minor version number. If you choose to include a data table (more on this later) you can indicate the post-header file length and the table length also.
For data of known length you can simply pack it into a struct and write it to the file. This works for quite a lot of things, but there are often cases where data of variable length needs to be recorded. For example, a string containing the name of the player's character could have a variable length. Alternatively, you may need to save the player's inventory, which would contain some variable number of items.
One approach to this is to place maximum lengths on these sets of data and simply zero out the unused data sections. This is fast and simple, but a bit wasteful, and imposing arbitrary limits may not suit your taste. As an example, if your maximum name length is 10 bytes then you could store it as an array of 10 char or an array of 11 char. In the 10 char case you must be sure to load it into an 11 char array and append a zero at load time, in case the name uses all 10 characters. In the 11 char case you simply include the terminating zero in the save file. Likewise, if you can describe your inventory entries as item ID and item quantity then a simple array of structs can be used to describe the whole inventory. If the length is fixed then this is POD data and you can simply read and write it to the file without alteration.
Alternatively, if you want to save a bit of space, or if you really want to avoid fixed-lengths, you can use either a data table or data segment prefixes. A data table would come immediately after the file header and would be a list of offsets into the payload in a known order.

Alternatively, you could store it as a list of section lengths. If you store offsets then you can read the entire data section into a single allocation and then add the address of that allocation to all the offsets (loaded into a struct) in order to cheaply create a table of contents for the loaded data. If you store lengths then you can easily create separate allocations for each segment of data and then load them one at a time.
Segment prefixes work similarly to a table of lengths, but instead of using a table you simply prefix each segment with an unsigned int representing its length. When loading the file you read in the length, allocate for the segment according to that value, read the segment, then simply repeat the process for the next segment.
Here's a small example. I've foregone error checking on the file I/O and memory allocations for the sake of brevity, but you should definitely use error checking when you write code that you intend to make use of. The strategy here was to use structs that are arranged for easy reading and writing. Any POD (plain-old-data == no pointers) struct is very easy to work with because you can simply read and write it directly to the file. Arrays of PODs are the same. For non-POD structs I made sure to put the non-POD entries at the end of the struct, declared a constant expressing the length of the POD section of the struct, and then wrote load/save functions for that struct that handle the allocation and serialization. In this way a complex struct containing all the data for the party in an RPG can be saved and loaded with single function calls. The functions simply read or write the POD section, then delegate to the helper functions of their non-POD members.


#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>

//make sure that there's no element padding, which would interfere with the layout of structures in memory
#pragma pack(1)

typedef struct {
  //lead with known-length data
  unsigned char str;
  unsigned short hp;
  unsigned short hpmax;
  //put variable-length data at the end
  char* name;
} PartyMember;
size_t PARTYMEMBER_POD_SECTION_LENGTH = sizeof(unsigned char) + (sizeof(short) * 2);

void saveString(const char* str, FILE* fp) {
  unsigned int len = strlen(str);
  //record string length
  fwrite(&len, sizeof(unsigned int), 1, fp);
  //record string
  fwrite(str, sizeof(char), len, fp);
}

void loadString(char** pstr, FILE* fp) {
  //get length
  unsigned int len;
  fread(&len, sizeof(unsigned int), 1, fp);

  //allocate and read
  char* str = malloc(len + 1); //allocate an additional byte
  fread(str, sizeof(char), len, fp);
  str[len] = 0; //remember to add the terminating zero

  *pstr = str;
}

void savePartyMember(PartyMember* member, FILE* fp) {
  fwrite(member, PARTYMEMBER_POD_SECTION_LENGTH, 1, fp);
  saveString(member->name, fp);
}

void loadPartyMember(PartyMember* member, FILE* fp) {
  fread(member, PARTYMEMBER_POD_SECTION_LENGTH, 1, fp);
  loadString(&member->name, fp);
}

typedef enum {
  NONE = 0,
  POTION = 1,
  SWORD = 2,
  BACON = 3
} ItemID;

typedef struct {
  unsigned int id;
  unsigned int quantity;
} Item;

typedef struct {
  unsigned int gold;
  int memberCt;
  int itemCt;
  Item* inventory;
  PartyMember* members;
} Party;
size_t PARTY_POD_SECTION_LENGTH = sizeof(unsigned int) + (sizeof(int) * 2);

void saveParty(Party* party, FILE* fp) {
  fwrite(party, PARTY_POD_SECTION_LENGTH, 1, fp);

  //Item is POD, so we can just dump the whole array
  fwrite(party->inventory, sizeof(Item), party->itemCt, fp);

  //Party members have variable length strings, so we need to save them individually
  for(int i = 0; i < party->memberCt; i++) {
    savePartyMember(&party->members[i], fp);
  }
}

void loadParty(Party* party, FILE* fp) {
  fread(party, PARTY_POD_SECTION_LENGTH, 1, fp);

  //allocate for the inventory and just read it all in at once, since it's POD
  party->inventory = malloc(sizeof(Item) * party->itemCt);
  fread(party->inventory, sizeof(Item), party->itemCt, fp);

  //load party members individually since they're non-POD
  party->members = malloc(sizeof(PartyMember) * party->memberCt);
  for(int i = 0; i < party->memberCt; i++) {
    loadPartyMember(&party->members[i], fp);
  }
}

typedef struct {
  unsigned short major;
  unsigned short minor;
} VersionNumber;

const VersionNumber VERSION = { 1, 0 };

typedef struct {
  unsigned int fourcc;
  VersionNumber ver;
} SaveHeader;

void saveGame(Party* party, const char* filename) {
  SaveHeader header;
  header.fourcc = 'SAVF';
  memcpy(&header.ver, &VERSION, sizeof(header.ver));

  FILE* fp = fopen(filename, "wb");
 
  //write header
  fwrite(&header, sizeof(header), 1, fp);

  //write party data
  saveParty(party, fp);

  fclose(fp);
}

void loadGame(Party* party, const char* filename) {
  FILE* fp = fopen(filename, "rb");

  SaveHeader header;
  fread(&header, sizeof(header), 1, fp);

  if(header.fourcc == 'FVAS') {
    printf("Endian mismatch!\n");
    exit(1);
  }

  if(header.fourcc != 'SAVF') {
    printf("Invalid savefile!\n");
    exit(1);
  }

  if(memcmp(&header.ver, &VERSION, sizeof(VERSION)) != 0) {
    printf("Savefile version mismatch!\n");
    exit(1);
  }

  loadParty(party, fp);
}

void addItem(Party* party, ItemID id, unsigned int quant) {
  Item* curItems = party->inventory;

  size_t oldLen = sizeof(Item) * party->itemCt;
  party->itemCt++;
  size_t newLen = sizeof(Item) * party->itemCt;

  Item* newItems = malloc(newLen);
  if(curItems) {
    memcpy(newItems, curItems, oldLen);
  }
 
  newItems[party->itemCt - 1].id = id;
  newItems[party->itemCt - 1].quantity = quant;

  party->inventory = newItems;
  free(curItems);
}

int main(int argc, char** argv) {
  Party a;
  memset(&a, 0, sizeof(a));

  a.gold = 42;
  addItem(&a, BACON, 3);
  addItem(&a, SWORD, 1);
  addItem(&a, POTION, 12);

  a.memberCt = 2;
  a.members = malloc(sizeof(PartyMember) * a.memberCt);

  a.members[0].str = 5;
  a.members[0].hp = 50;
  a.members[0].hpmax = 100;
  a.members[0].name = malloc(5);
  sprintf(a.members[0].name, "Derp");

  a.members[1].str = 15;
  a.members[1].hp = 200;
  a.members[1].hpmax = 200;
  a.members[1].name = malloc(7);
  sprintf(a.members[1].name, "Hamlet");

  saveGame(&a, "save.bin");

  Party b;
  loadGame(&b, "save.bin");

  assert(a.gold == b.gold);

  assert(a.itemCt == b.itemCt);
  for(int i = 0; i < a.itemCt; i++) {
    assert(a.inventory[i].id == b.inventory[i].id);
    assert(a.inventory[i].quantity == b.inventory[i].quantity);
  }

  assert(a.memberCt == b.memberCt);
  for(int i = 0; i < a.memberCt; i++) {
    PartyMember* ma = &a.members[i];
    PartyMember* mb = &b.members[i];
    assert(ma->hp == mb->hp);
    assert(ma->hpmax == mb->hpmax);
    assert(ma->str == mb->str);
    assert(strcmp(ma->name, mb->name) == 0);
  }

  printf("Tested OK!\n");
  return 0;
}

This test succeeded for me in VS2015CE. Here's the file that was produced (note that the fourcc appears "backward" in the file because I'm on a little endian system):
192d018882.png

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

One little addition to what Khatharr sayed. I add an additional chunk id into my serializer/deserializer to make savegames backwards compatible. Adding new states to save go into a new chunk id so older versions of the game would read anything except that new parts and newer versions wouldnt conflict with it without any great overhead of code.

My utility class has just some templated global function definition


template<typename T> inline void Serialize(T& instance, IDataWriter& stream) //uses binary as default
{
  BinarySerializer bs;
  Serialize(instance, stream, bs);
}
template<typename T> void Serialize(T& instance, IDataWriter& stream, ISerializer& writer); //may choose a different serializer like JSON

And any class/type may create its own overload for the second global function so you may call Serialize(myObject, myStream); for whatever it is declared for.

ISerializer is an abstract class that implements some data handling like writing/reading JSON file format or simply binary for base types and strings as same as BeginChunk and EndChunk to declare sections. It is simple and works in just a few functions

All that template stuff does not apply --- the language is C, not C++.

This topic is closed to new replies.

Advertisement