Major Performance Issues with yaml-cpp

Started by
4 comments, last by 3dmodelerguy 5 years, 10 months ago

So I have been playing around with yaml-cpp as I want to use YAML for most of my game data files however I am running into some pretty big performance issues and not sure if it is something I am doing or the library itself.

I created this code in order to test a moderately sized file:


Player newPlayer = Player();
newPlayer.name = "new player";
newPlayer.maximumHealth = 1000;
newPlayer.currentHealth = 1;

Inventory newInventory;
newInventory.maximumWeight = 10.9f;

for (int z = 0; z < 10000; z++) {
  InventoryItem* newItem = new InventoryItem();
  newItem->name = "Stone";
  newItem->baseValue = 1;
  newItem->weight = 0.1f;

  newInventory.items.push_back(newItem);
}

YAML::Node newSavedGame;
newSavedGame["player"] = newPlayer;
newSavedGame["inventory"] = newInventory;

This is where I ran into my first issue, memory consumption.

Before I added this code, the memory usage of my game was about 22MB. After I added everything expect the YAML::Node stuff, it went up to 23MB, so far nothing unexpected. Then when I added the YAML::Node and added data to it, the memory went up to 108MB. I am not sure why when I add the class instance it only adds like 1MB of memory but then copying that data to a YAML:Node instance, it take another 85MB of memory.

So putting that issue aside, I want want to test the performance of writing out the files. the initial attempt looked like this:


void YamlUtility::saveAsFile(YAML::Node node, std::string filePath) {
  std::ofstream myfile;

  myfile.open(filePath);
  myfile << node << std::endl;

  myfile.close();
}

To write out the file (that ends up to be about 570KB), it took about 8 seconds to do that. That seems really slow to me.

After read the documentation a little more I decide to try a different route using the YAML::Emitter, the implemntation looked like this:


static void buildYamlManually(std::ofstream& file, YAML::Node node) {
  YAML::Emitter out;
  out << YAML::BeginMap << YAML::Key << "player" << YAML::Value << YAML::BeginMap << YAML::Key << "name" << YAML::Value
      << node["player"]["name"].as<std::string>() << YAML::Key << "maximumHealth" << YAML::Value
      << node["player"]["maximumHealth"].as<int>() << YAML::Key << "currentHealth" << YAML::Value
      << node["player"]["currentHealth"].as<int>() << YAML::EndMap;

  out << YAML::BeginSeq;

  std::vector<InventoryItem*> items = node["inventory"]["items"].as<std::vector<InventoryItem*>>();

  for (InventoryItem* const value : items) {
    out << YAML::BeginMap << YAML::Key << "name" << YAML::Value << value->name << YAML::Key << "baseValue"
        << YAML::Value << value->baseValue << YAML::Key << "weight" << YAML::Value << value->weight << YAML::EndMap;
  }

  out << YAML::EndSeq;

  out << YAML::EndMap;

  file << out.c_str() << std::endl;
}

While this did seem to improve the speed, it was still take about 7 seconds instead of 8 seconds.

Since it has been a while since I used C++ and was not sure if this was normal, I decided to for testing just write a simple method to manually generate the YAMLin this use case, that looked something like this:


static void buildYamlManually(std::ofstream& file, SavedGame savedGame) {
  file << "player: \n"
       << "  name: " << savedGame.player.name << "\n  maximumHealth: " << savedGame.player.maximumHealth
       << "\n  currentHealth: " << savedGame.player.currentHealth << "\ninventory:"
       << "\n  maximumWeight: " << savedGame.inventory.maximumWeight << "\n  items:";

  for (InventoryItem* const value : savedGame.inventory.items) {
    file << "\n    - name: " << value->name << "\n      baseValue: " << value->baseValue
         << "\n      weight: " << value->weight;
  }
}

This wrote the same file and it took about 0.15 seconds which seemed a lot more to what I was expecting.

While I would expect some overhead in using yaml-cpp to manage and write out YAML files, it consuming 70X+ the amount of memory and it being 40X+ slower in writing files seems really bad.

I am not sure if I am doing something wrong with how I am using yaml-cpp that would be causing this issue or maybe it was never design to handle large files but was just wondering if anyone has any insight on what might be happening here (or an alternative to dealing with YAMLin C++)?

Advertisement

Impressive. Yeah, text is slow. It's been rather amusing watching the progression of fashion conscious programmers rave about xml, then json and now perhaps yaml is the new Yeezy of the fashion world. A word of warning though, just because something is fashionable, doesn't mean it's any good.

First obviously double check you need to use text for larger files, it's nearly always going to be orders of magnitude slower than a good binary format, whatever you do.

Then consider whether you need an all singing all dancing implementation. Often times you know what is going to be in the file so you can write a far simpler parser that will be loads quicker, and doesn't dance around with dynamic allocations for 3 minutes.

Also if you can, when loading preload the entire text into memory, and when writing write it to memory then write it to disk as one final operation. While disk caching is in theory meant to deal with this for you, there might be all kinds of hidden overheads.

As for yaml sorry I don't know, but I hear 'zaml' is going to be big in next summer's collection. :)

4 hours ago, lawnjelly said:

Impressive. Yeah, text is slow. It's been rather amusing watching the progression of fashion conscious programmers rave about xml, then json and now perhaps yaml is the new Yeezy of the fashion world. A word of warning though, just because something is fashionable, doesn't mean it's any good.

I am not trying to use YAML because it is hip or whatever but that my game is going to involve a lot of data that is going to be loading from files and I want to make writing (manually) / reading those files as clean as possible and YAML in my opinion has the best combination of readability and functionality.

4 hours ago, lawnjelly said:

First obviously double check you need to use text for larger files, it's nearly always going to be orders of magnitude slower than a good binary format, whatever you do.

This might be something I want to do later on however I think I am going to what the option to have most of the data that can be in text have the option to be stored in text just for easier debug-ability.

4 hours ago, lawnjelly said:

Then consider whether you need an all singing all dancing implementation. Often times you know what is going to be in the file so you can write a far simpler parser that will be loads quicker, and doesn't dance around with dynamic allocations for 3 minutes.

This is true and something I might do later as well on but I think using a library should be easier for right now verse implementing a custom parser that implements only the features I need (which I would agree is probably a pretty small subset of the full YAML spec).

As to my performance issue, it seems the main issue is with running the code in debug mode (using default Visual Studio debug settings). In release mode the file writes in about 0.15 seconds (instead of about 7 seconds) and memory spikes to only 36MB (instead of 108MB). I thought it was good practice to always run code in debug mode when writing code however this kinda of performance loss seems bad. I am not sure if there are some tweaks to I can make to the debug mode settings to prevent this kinda of performance hit while still retaining a lot of the benefit of debug mode.

I don't know how yaml cpp works, but at first glance it looks like you can build it separately as library (without debugging stuff), and then link it with your application. Obviously you won't get debugging facilities in the yaml-cpp code nor get any useful srtack-trace in case of a crash there, but your primary interest is not in debugging yaml-cpp :)

 

So by following some advice on stackoverflow, by adding this as a preprocessor defination:

_HAS_ITERATOR_DEBUGGING=0

and setting Basic Runtime Checks to Default it got the code down to running in just over 1 seconds. I guess I will leave it at that for time time being

2 hours ago, Alberth said:

I don't know how yaml cpp works, but at first glance it looks like you can build it separately as library (without debugging stuff), and then link it with your application. Obviously you won't get debugging facilities in the yaml-cpp code nor get any useful srtack-trace in case of a crash there, but your primary interest is not in debugging yaml-cpp :)

 

I might look at doing this at some point.

This topic is closed to new replies.

Advertisement