Commiting persistent server data

Started by
15 comments, last by Acar 7 years, 6 months ago

Hi.

Upon advices some of you guys have given, I've switched to using a simple 'c structure' based data storage where I just dump the data in memory into a file instead of using a database. It works wonders for the most part, but I have a concern regarding certain types of data which can reach very high amounts of entries.

Currently, I commit all of the data as I don't have a way of detecting which entries have changed. I ran a test and it took 160 seconds to commit 5 million entries(each entry being 256 bytes). This is what I have at the moment:


struct PersistentData
{
       ItemData       item_data[MAX_ITEMS];
       unsigned int   num_item_data;
       unsigned char  removed_item_data[MAX_ITEMS];
       /* Other types of data in same format */
};
 
ItemData *addItemData(PersistentData *pd, ItemData *data)
{
   if(pd->num_item_data >= MAX_ITEMS)
   {
      assert(0);
      return 0;
   }   
   memcpy(&pd->item_data[pd->num_item_data], data, sizeof(ItemData));
   return &pd->item_data[pd->num_item_data++];
}
void getItemData(PersistentData *pd /* List pointer, max list size, num_results pointer, comparison parameters, etc. */)
{
   unsigned int num_res = 0;
   for (unsigned int i = 0; i < pd->num_item_data; i++)
   {
       if(0 /* compare */)
       {
           list[num_res++] = &pd->item_data[i];
           /* Break if reached max list size */
       }
   }
   *num_results = num_res;
}
int removeItemData(PersistentData *pd /* comparison parameters */)
{
   for (unsigned int i = 0; i < pd->num_item_data; i++)
   {
       if(0 /* compare */)
       {
           pd->removed_item_data[i] = 1;
           return 1;
       }
   }
   return 0;
}
int commitPersistentData(PersistentData *pd)
{
   /* Open file, etc. */
   for (unsigned int i = 0; i < pd->num_item_data; i++)
   {
         if(pd->removed_item_data[i] == 0)
         {
             /* Write to file */
         }
   }
   /* Close file */
   return 1;
}

I believe it takes so long because I'm writing each entry one by one. When I ignore the removed entries and write the data all at once it takes about 15 seconds! If I could move the last entry into the removed entry I could write them all at once but that is not an option either as another part of the program could have a pointer to last entry.

Is there a complicated or even better, a simple, solution which I could implement?

Advertisement

I don't have a way of detecting which entries have changed

If it's your program that does the changing, then you know which entries have changed. Fix this problem, first. You don't ever want to be writing 5 million of anything.

Why do you have everything in this massive array? Why do you set a 'removed' flag instead of actually removing the item? Why do you want to write everything at once?

I also think you have misinterpreted the previous advice on when to use a database and when not to. It's true that you shouldn't use a database for every piece of game data, but neither should you be reading and writing it all to and from disk either. Usually there is (or at least needs to be) a clear segregation between static data (read at initialisation time, never written, and probably stored in files) and dynamic data (read when needed, written when needed, probably stored in a database).

If it's your program that does the changing, then you know which entries have changed. Fix this problem, first. You don't ever want to be writing 5 million of anything.

Thanks for the reply. I guess I could add a flag which I would set each time I change something in entry.

Why do you have everything in this massive array? Why do you set a 'removed' flag instead of actually removing the item? Why do you want to write everything at once?

Wouldn't it be better to have them in a continuous memory to iterate faster? How would I go about removing it? I mean I can't move the last entry into the one I remove.

I guess I could add a flag which I would set each time I change something in entry.

Yes, dirty flags are a common approach to this kind of problem, but it's probably better to have the data stored in a more effective structure too.

Wouldn't it be better to have them in a continuous memory to iterate faster?

Why are you making life difficult for yourself on an assumption? Yes, some processing will work faster if the data it operates on is all in contiguous memory. Now tell me, what complex operation are you executing on ItemData that needs to be run on hundreds or thousands of them each frame, very quickly? (It might help if I understood what you're trying to do with getItemData and removeItemData, because you've removed all the actual semantics from your example.)

Why are you making life difficult for yourself on an assumption? Yes, some processing will work faster if the data it operates on is all in contiguous memory. Now tell me, what complex operation are you executing on ItemData that needs to be run on hundreds or thousands of them each frame, very quickly? (It might help if I understood what you're trying to do with getItemData and removeItemData, because you've removed all the actual semantics from your example.)

I call getItemData each time a player connects. It simply compares the character name and optionally status of the item with each entry on the array and depending on result adds them to the list. Once I get the persistent item data, I assign it to a real item object(with the object having a pointer to persistent data) and add that item object to players inventory. So each time I make a change on item I'm directly editing the contents of the persistent data. When I want to remove an item, I call removeItemData and also remove the real item object.

Okay, I don't understand why you're doing that or why you'd want to. Are you trying to hold every single item ever created or ever held by a player, in memory, at all times?

Okay, I don't understand why you're doing that or why you'd want to. Are you trying to hold every single item ever created or ever held by a player, in memory, at all times?

That is what I'm doing. It was the suggestion made by some people in the other thread, keeping the entire dataset in memory at all times. Again, it works quite well for the most part with only issue being commiting the removed data. I'll try to add a flag inside 'ItemData' to indicate if the entry has changed or removed and see if that helps.

When people say "keep the entire dataset in memory at all times" they don't mean you to keep all player data live all the time. They just mean that the database (or filesystem) is basically just a write-back cache for the game, rather than being operated on directly as a result of game logic. Player data gets loaded and created when the player logs in, and when they log out their information is written to disk or database, and their memory and the memory for objects they're carrying is deallocated.

Some random points:

On writing binary data:
If you have a big struct of "everything" without any pointers in it, a single call to write() will write that out just fine.
Typically, you'll want to write() to a new file (foo.data.tmp) and then use rename() to replace the old copy with the new in an atomic operation (rename(foo.data.tmp, foo.data))
Typically, you also sync() and fdatasync() after closing the file before renaming, and then after renaming.
If you find that you need to be selective about what you write to disk, writing by appending to a big memory buffer, and then writing that big buffer to disk all at once, is another way to get good I/O performance.

On keeping data in memory:
You need to keep all ACTIVE data in memory, so that you don't block gameplay on accessing the database or disk.
However, when a user logs out, it's totally fine to purge that data from memory. When a user logs in again, re-load the data.

On arranging data for querying:
Whenever you find that you have to "look for data," you typically need to use a hash table (such as std::unordered_map<>).
Linear scans through arrays is fine for small arrays that fit in a couple of cache lines, but will suck fiercely once the array gets big.
If you need the data to be sorted, use a tree of some sort instead (such as std::map<>).
enum Bool { True, False, FileNotFound };

Some random points:

On writing binary data:
If you have a big struct of "everything" without any pointers in it, a single call to write() will write that out just fine.
Typically, you'll want to write() to a new file (foo.data.tmp) and then use rename() to replace the old copy with the new in an atomic operation (rename(foo.data.tmp, foo.data))
Typically, you also sync() and fdatasync() after closing the file before renaming, and then after renaming.
If you find that you need to be selective about what you write to disk, writing by appending to a big memory buffer, and then writing that big buffer to disk all at once, is another way to get good I/O performance.

On keeping data in memory:
You need to keep all ACTIVE data in memory, so that you don't block gameplay on accessing the database or disk.
However, when a user logs out, it's totally fine to purge that data from memory. When a user logs in again, re-load the data.

On arranging data for querying:
Whenever you find that you have to "look for data," you typically need to use a hash table (such as std::unordered_map<>).
Linear scans through arrays is fine for small arrays that fit in a couple of cache lines, but will suck fiercely once the array gets big.
If you need the data to be sorted, use a tree of some sort instead (such as std::map<>).

Thanks for the tips. One thing I'm confused about is, if I were to keep the data in a file in disk and if the data is unsorted, wouldn't it take too long to query what I need from that file?

Also if the memory usage isn't a concern, is it still a bad idea to keep all data in memory?

This topic is closed to new replies.

Advertisement