Representing data-driven concepts alongside instances of those concepts

Started by
5 comments, last by GameDev.net 10 years, 9 months ago

In many games, it's typical to load in several 'concepts' or 'definitions' from the data - for example, you might load in vehicle types, character classes, item types, etc. This data might be passed to some sort of factory which creates one of several related classes each time. And then in-game, you create instances of these concepts - vehicles, individual characters, individual items, etc. These will reference the definition class to get access to various pieces of data.

But where it seems to get tricky is when the instances are used in some sort of algorithm, and need their own set of state data, which might vary depending on the concept being referenced. If you can have fully-generic concepts, or fully-generic instances, it's not an issue. But often you don't, and the specifics of the instance may depend on the specifics of the concept.

I can think of several ways to approach this in C++, but none of them are fully satisfactory.

  • If there's one generic 'instance' shared across all related concepts, it needs to accommodate all possible state, which is awkward to maintain. (eg. the Vehicle object might need current_gear for cars, landing_gear_down for planes, rudder_position for boats, etc etc.)
  • if there's a generic data store used as the state data - eg. a std::map of key/value objects - then it will work for any case... if you don't mind all the error-checking in the definition to ensure important keys exist, and to ensure the values are the right type, etc.
  • If there's a separate instance class for every concept class, it's error-prone. You have to be very sure to create them properly and then several parts of the instance class need to perform casts to the assumed type.
  • If the definition class is re-used as the instance class - e.g. using the Prototype pattern - then you have one C++ class essentially handling 2 responsibilities. The part of the code dealing only with definitions has several state variables it doesn't need to touch, and the part dealing only with instances has several definition variables it shouldn't touch. Plus it wastes memory to duplicate the definition in that way.

This does seem like a problem lots of intermediate-level development will face. How are people handling issues like this?

(Edit: there's a good article on gameprogrammingpatterns.com about this, but it basically just agrees that behaviour becomes more difficult: http://gameprogrammingpatterns.com/type-object.html#it%27s-harder-to-define-behavior-for-each-type)

Advertisement

This is always a pain in the butt. I look at it from three points of view:

  • I will not sacrifice simple and flexible engine code for design time considerations. It is not that I don't care about those considerations, I just don't think they apply at this level. I use a component system which implies a lack of hierarchy so the example of changing 50 elves data is a pain, but that can be dealt with separately.
  • I start with the simple key value solution mentioned which is of course relatively slowish. The key/values are setup such that I can replace them with binary data blobs later.
  • Whatever sucks in the above gets fixed in the tools.

That last point is the key point, I write the code as simple as possible for the engine and clean up the usability in the tools. You want hierarchy? Create it in the tools, the related data does not ever need to actually exist in the game itself. When I say "tools", I tend to start with something very simple such as an XML (JSon anymore) file which describes the objects as groups of components and initializes each component. I generally just use a console tool to turn that file into the game usable data which doesn't contain the hierarchy and user only required bits. This can be called at startup of the game (not unreasonable for a couple thousand object definitions), from an editor or manually. The key trick is then that you can use the simple starting point of the key/values and as long as that one portion is designed such that you can switch it to binary you can keep the concerns separate.

So, all said and done, I tend to start with the second solution of totally generic input data with a completely flat hierarchy. It keeps everything nice and separate, allows you to upgrade only as required/desired and you can work out a lot of problems using the simple solution before hardening up the result with the binary output.

Not sure if this helps the thought process since I don't go into any detail but as a starting point I like to think about the separation of concerns first.


The key trick is then that you can use the simple starting point of the key/values and as long as that one portion is designed such that you can switch it to binary you can keep the concerns separate.

What do you mean by switch to binary? Do you mean the file format or the in-memory representation? I'm not worried about the file format, as firstly that's just an optimisation issue, and secondly that is data which is read-only anyway. But if you're talking about the in-memory representation then that is exactly the problem I don't have a solution for - because each concept class (or interchangeable component, in your case) could potentially have a totally different set of data that it needs, and there seems to be no way of representing all of those possibilities that is both clean and safe.

In many games, it's typical to load in several 'concepts' or 'definitions' from the data - for example, you might load in vehicle types, character classes, item types, etc. This data might be passed to some sort of factory which creates one of several related classes each time. And then in-game, you create instances of these concepts - vehicles, individual characters, individual items, etc. These will reference the definition class to get access to various pieces of data.

But where it seems to get tricky is when the instances are used in some sort of algorithm, and need their own set of state data, which might vary depending on the concept being referenced. If you can have fully-generic concepts, or fully-generic instances, it's not an issue. But often you don't, and the specifics of the instance may depend on the specifics of the concept.

I work on a game that has a very generic, proprietary data system for storing some of it's most important data, and very often it's useful, and very often it's a huge mess. I wouldn't try to get there until you need it. I think it can wait until you are beyond intermediate development.

If I started from scratch and wanted to handle generic data, I'd incorporate a mature (old & stable) scripting language and use it often. A well-supported interpreter will have those generic data structures you want, and I'd much rather get a nice scripting error instead of crashing in the middle of whatever homebrew data-driven mess I've created myself, or whatever proprietary system is giving me a 30-level deep callstack of unreadable functions because some data file wasn't updated correctly.

I cannot really give you much of an answer here, without at least one fleshed-out example of the problem. Just don't think too far ahead. Code exactly the variation you need, no more. If you're a programmer first and foremost, whatever you're working on will be easier to do if you can do things like easily read your state at a breakpoint, and step through your logic in a debugger. Once you go more data-driven, some of those debugging features will be lost until you create them yourself.


The key trick is then that you can use the simple starting point of the key/values and as long as that one portion is designed such that you can switch it to binary you can keep the concerns separate.

What do you mean by switch to binary? Do you mean the file format or the in-memory representation? I'm not worried about the file format, as firstly that's just an optimisation issue, and secondly that is data which is read-only anyway. But if you're talking about the in-memory representation then that is exactly the problem I don't have a solution for - because each concept class (or interchangeable component, in your case) could potentially have a totally different set of data that it needs, and there seems to be no way of representing all of those possibilities that is both clean and safe.

Figured this might come up. The way I work with this is a bit odd, I like it though. So, I don't put any serialization in the objects/components at all, they initialize from a data structure I call a Descriptor (or Desc for short in code). During development I load the structures from text files or whatever and the structures are passed into the object/component initialization so they don't care how the data got to them. In development there are various functions to modify those structures, reload, etc in a database like system. At release time though, I just bulk load the descriptors directly into memory: the DB still exists so you can load/unload descriptors if required, they are tiny things usually so I generally just pop them all into memory in one shot for small games. These are purely read only items though, any modification only takes place in dev builds.

In general, it might be:

struct HealthDesc : public Descriptor
{
  int MaxHealth;
};
 
class HealthComponent : public Component
{
public:
  ... blah blah ...
  bool    Initialize( const Descriptor* desc ) override;
};

While that is a silly small focused component, hopefully you get the idea. How I get the Desc doesn't matter in this way. The tools and the desc manager systems are the only things which have to be synchronized and it is even pretty easy to use things like JSON/XML to describe the data and write a little tool to generate the header information for you. Generating headers has a downside of course but still there are lots of options on how to deal with this at a later time since you have completely disconnected the serialization from the repetitive bits of code.

Hope this clarifies my thinking on this.

Pink Horror: I guess the idea of using a scripting language for it is basically pushing the casts out into that language. Instead of a crash in the C++ side you'll get an exception in the script. I think that is better in some cases. Unfortunately, it's not an option for me, and I absolutely do need to be able to have data-driven concepts with varying degrees of state when handling those concepts.

AllEightUp: it looks like you're describing my 3rd approach in my original message, i.e. "a separate instance class for every concept class". You have one Descriptor class for every Component, and it has to be able to trust that it is given the right component and will have to make an unsafe cast from the base type to the derived type on the assumption that it was given the right one. Obviously if your data handling is robust enough then you'll never have a problem. I'm always wary of any situation where errors in tools can crash the game engine though!

AllEightUp: it looks like you're describing my 3rd approach in my original message, i.e. "a separate instance class for every concept class". You have one Descriptor class for every Component, and it has to be able to trust that it is given the right component and will have to make an unsafe cast from the base type to the derived type on the assumption that it was given the right one. Obviously if your data handling is robust enough then you'll never have a problem. I'm always wary of any situation where errors in tools can crash the game engine though!

Ah yes, it does look to fit in your third case. Though I do use a safety mechanism to provide the cast and validate it. Dev builds go through a templated cast function which in turn validates the underlying data is of the correct type. Not only does this guarantee the conversion is proper, it removes the ugly ass reinterpret_cast or C style cast of the pointer. Pretty much I just dump a 32bit id in front of each initializer and the cast in dev builds validates the id is for the correct type. That catches a lot of errors by itself and it is much prettier. :)

As to errors in the tools causing crashes. Well, I don't think there is any real way around that. Even if I passed in the key value pairs directly and did a bunch of error checks per object/component, it is still possible to get crap in that could crash things. As it is, with the tool and db manager relationship there is a nice clean pipeline to debug any problems and double check things. I actually use a full circle tester, read the JSON, write to binary, read the binary, write back to JSON, compare the JSON to the original. It is pretty tough to pass that test if there are any goofs. :)

This topic is closed to new replies.

Advertisement