Ecs Inheritance Problem (C++)

Started by
26 comments, last by lexington 7 years, 7 months ago

Hey guys,

Im working on creating an entity component system but I am stuck on one particular problem that I need some help with.

Currently i have a base class entity that have a vector of component pointers like the following


class entity
{
  int id;
  std::vector<component*> mycomponents;
  int entity_type;
  bool deleteme;

}

I have public functions that will allow me to both add and delete components from the entity. after this i will have a bunch of systems that will do the updating for me (the components will just have information other then maybe the ai component which would contain a behaviour tree/FSM)

The problem that I am trying to figure out is how would i get the correct component i need for a system (say i need the position and velocity components for the movement system). After having a google a round I can see that i could do a dynamic cast (as my base component class will have variable to say what type of component it is) but i have heard that this is not really a very efficient method and not really good practice (feel free to correct me). I could also do some kind of messaging system for changing the data in a component but getting data out of it would be harder (would need a virtual return function of each data type).

Any advice on a way I could get around this problem. is their a method that i have over looked?

Advertisement

The way I handled this (and my early implementation was very close to the somewhat popular EntityX-framework):

Each component class/type gets an unique id, counting from 0 onwards. This is handled via CRTP, but could be done otherwise. In any way, now instead of dynamic_cast, you can just static_cast and check for your own type id (you've thus implemented a simplistic, fast but less flexible runtime type system).

Then, in an simple implementation, you can eighter 1) change your vector to an std::map/unordered map, and store each component directly with their type index or 2) simply store the component at their type index in the vector, and fill all non-used entries with nullptr. This could potentially waste memory if you have many components and many of your entities are using like say 31 and 39 but nothing else, though I don't know the exact memory footprint of map/unordered_map in comparison.

Though this is not the cleanest/purest of designs for an ECS (and it goes against the idea of cache coherence that ECS heavily implies), it should be sufficient most of the time. Now if you want/need to optimize at some point, instead of storing the components in the entity, store them in the system/a specific component pool. Then, when accessing components for a system (like in your example position + velocity), instead of querying all the entities and retriving the components from them, you instead ask the pool for the components, and build a subset of all components that are in both pools. If you want to go even further, you can cache this operation so it does not have to be redone unless you added entities/components, but I really wouldn't worry about this until it proves that this is a major bottleneck in your code.

Now back to the type-system, to give you an idea how this is handled, have a look at this code:


struct ACCLIMATE_API BaseComponent
{
    using Family = unsigned int; ///< Component struct type

    virtual Family GetFamily(void) const = 0;
    
    static Family GetFamilyCount(void);

protected:
    static Family GenerateFamilyId(void); // returns family_count and increases it by 1.
private:
    static Family family_count; /**<    Counter of instanciated components.
        *   Each component of a different type that is created, or whose
        *   family function is called the first time will increase this
        *   counter. The counter is used as the ID of the next component. */

};

/*********************************
* Component struct
**********************************/

/// The component parent struct
/** Each component must derive from this struct. It automatically sets its
*   unique family id on creation, which can then be accessed via a static
*   method, so you can compare the type of a component object with the type
*   of a component struct. */
template <typename Derived> //template to pick implementation
struct Component :
    public BaseComponent
{
    /** Returns the "family" type of the component.
     *    This method sets the type id for the specific component struct
     *  on the first call, and returns it from there on. The id
     *  is consistend between the running time of a program and cannot
     *  be altered, but it may alter between different runs, depending
     *  on the first creation order of the components.
     *
     *  @return The unique id of this component instance
     */
    static Family family(void)
    {
        //increase base family count on first access
        static const BaseComponent::Family Family = GenerateFamilyId();
        return Family;
    }

    Family GetFamily(void) const override final
    {
        return family();
    }

};

Now every component will derive from Component<> in this manner:


class Position : 
    public Component<Position>
{
};

This will ensure that for each different component type, they will have an unique type id (accessible via Type::family() or baseComponent.GetFamily()). Now you can make methods like this:


Entity entity;

entity.AddComponent<Position>();
auto pPosition = entity.GetComponent<Position>();

That are able to use this static runtime type id and store/access components reliably.

Store handles to components in the entity, instead. The components themselves of course would live in their own systems. Make your handles store the component type as well as the index of the component.

Then, when you want to get a particular component of type:
- walk the list of component handles on the entity looking for one with the right type id. You probably don't need a map or even a hash table for this, unless you actually have more than a few components per entity.
- when you have the handle of the relevant type, you already know which system to look at - just look up the component in that system using the handle.

This approach also lets you move the components themselves around in memory as need be, as long as you preserve their order.

There are many reasons to avoid dynamic_cast, and I would (almost) never recommend sprinkling it throughout runtime game code. The most common reasons look something like the following:

  • dynamic_cast relies on RTTI... therefore, in order to use dynamic_cast, you must enable RTTI. This will significantly bloat your application size, which can be a serious consideration on some media distribution protocols (hard disc and OTA download limits for mobile come to mind).
  • dynamic_cast relies on RTTI (noticing a trend?)... the data for which will be primarily stored on disk at runtime. This means most dynamic_cast operations will necessarily contain a fetch from disk in order to retrieve that data. Not only does this make the dynamic_cast itself slow, but it can really hold hostage your ability to exercise hardware caching mechanisms for other systems, like streaming implementation.
  • dynamic_cast relies on RTTI ( :))... therefore, in order to use dynamic_cast, you must enable RTTI. If you enable it, it will get used. In most cases, your junior engineers will be the ones using it, without even realizing that they are doing it. It will prove much more difficult to remove dependencies on RTTI from existing code than it would have been to simply write code that didn't depend on RTTI in the first place.

The concepts presented by Juliean are also basically the same way that I solved this problem (though I explicitly avoided creating virtual tables on my Component objects), and I don't consider it particularly unclean. My implementation makes an additional assertion that I have not yet seen articulated here - an Entity can only ever have a single instance of a given Component type. For example, my entities can have both a ModelComponent and an AnimationComponent, but a single entity can't have two AnimationComponents. I also make use of hash tables rather than maps to get average-case constant-time lookups, under the assumption that I will be executing lookups more often than I will be executing inserts or deletes.

I would actually shy away from maintaining this kind of data in globally managed lists. Those global lists will cost you headaches, cycles, or both the first time someone tries to create, delete, or access a component on a background thread.

dynamic_cast relies on RTTI (noticing a trend?)... the data for which will be primarily stored on disk at runtime. This means most dynamic_cast operations will necessarily contain a fetch from disk in order to retrieve that data. Not only does this make the dynamic_cast itself slow, but it can really hold hostage your ability to exercise hardware caching mechanisms for other systems, like streaming implementation.

Really? This calls for "citation needed", as I don't see why RTTI should store its data on the disk (why not keep it in RAM/memory?), and a quick google search showed up nothing to back the claim that RTTI will access disk data.

The concepts presented by Juliean are also basically the same way that I solved this problem (though I explicitly avoided creating virtual tables on my Component objects), and I don't consider it particularly unclean.

Yeah, avoiding virtual methods really is a good think when dealing with components. In case that you can avaoid the vtable altogether, you can just store the component type as a member variable instead :) (In my case I cannot yet, as for some editor functionality I reguire virtual methods... at some point I will separate editor/runtime-data altogether though, because its much cleaner this way).

I also make use of hash tables rather than maps to get average-case constant-time lookups, under the assumption that I will be executing lookups more often than I will be executing inserts or deletes.

Note that std::unordered_map which I mentioned is an hashtable :) - though in the case of entities, it can be disadvantegious, because of the sheer size of the data structure - 8 byte (map) vs 32 byte (unordered_map). In my test cases using unordered map would be way slower than map, probably due more than double the size of an entity, effectively making cache hit rate much worse. In any case, benchmark it out yourself before deciding - except for "reserve" on an hashtable, switching both out should be fairly easy.

I've been grappling recently with this very problem also. Although I'm not fully convinced what the best answer is, I'll give my opinions and hope they help. In short, I implemented a very similar solution to that described by Juliean, having component types inherit for a base class which allows you to generate unique ids for all component types. It's simply a DIY-way of doing dynamic_cast that avoids RTTI (as stressed by Thinias above). This at least helps with avoiding doing the dynamic casting. Another idea I've been experimenting with is using ids to represent entities and doing away with the component-pointer vector entirely; however that has its own share of difficulties associated with it.

The other part of the problem of course is that one of the advantages of decoupling your code with the ECS approach is that systems can operate entirely (in the ideal case) on a batch of data that is contiguous in memory and therefore cache-optimal. So if you're looking up other components for every entity in your system, then that's a lot of cache misses. Very often the biggest culprit is you'll need the Transform/position component in almost every system. I've decided to try to mitigate this by simply storing a local copy of the transform in components that need it (e.g. RenderingComponent). If most of your entities are static, then it's simply a case of communicating the new transform/position data of moving entities and then you'll have nice cache-friendly components for your systems. You cannot fix this with every case of course, but it's a useful optimisations for when cache misses become a problem.

Really? This calls for "citation needed", as I don't see why RTTI should store its data on the disk (why not keep it in RAM/memory?), and a quick google search showed up nothing to back the claim that RTTI will access disk data.

You seem to be right :) - I can't find any documentation articulating where this misconception came from on my end. I suspect my own personal training to avoid RTTI caused the misconception. You're probably right, and this data is probably just being memory mapped. If it is infrequently accessed, you'll still end up paying for the disk cost of cycling the page back into memory... but if it is frequently accessed, you probably pay for it once and move on. Since I never frequently access RTTI data... I always end up paging the request in, and hitting disk in the process.

Thanks for helping me to improve my understanding of the subject :).

@[member='Juliean'] (and others), Thanks for the detail reply. if you don't mind, can I just run something past you to confirm i understand the logic of what you said.

So for my entity class, i would now have a map for the component and its ID (the id so i can check what type of component it is). The map will be (from your example) component and an unsigned int for the ID. each component for an entity will have a different ID.

The only thing i do not understand is how will the component parent class (not the basecomponent) return a key for each component i create for an entity as surely i will be doing xcomponent x = new component each time. Would i not have to create a separate system for this and have an instance for it per entity.

Also am i just filling the id in the std::map with what is returned from the BaseComponent.

Then im guessing the getcomponent function on the entity will return the static cast of the component that we requested, by using the ID and matching it with the correct component.

Thanks for the time in advance :)

You don't even need an std::map. Your vector will work fine (although it might be a bit slower) - just iterate through until you find the component with the type ID you need.


// untested pseudo-code
class component
{
    virtual int GetTypeID() const = 0;
};

class derived_component
{
public:
    static const int TypeID = 1234;
    virtual int GetTypeID() { return TypeID; } const;
};

template <typename T>
T* entity::GetComponent()
{
    for (int ii=0; ii<mycomponents.size(); ++ii)
    {
        if (mycomponents[ii].GetTypeID() == T::TypeID)
        {
            return static_cast<T*>(mycomponents[ii]);
        }
    }
    return nullptr;
}

I had some idea without being able to implement it yet. I hope this isn't completely useless, it's possible that I have just overlooked something simple and this just doesn't work. Anyways.. I think the following way of component storage is familiar to most people who have implemented an ECS system, just a short recap:

Have global arrays of the same fixed size for each component type where the entity ID is a literal index into the array. Creating a new entity anywhere just increases the array size for all component types by 1.

If your game is entirely coded in C++ and don't access or create component types from outside scripts, all necessary information is available statically and getting any component from any type is just a fixed array access. I.e.


auto position = g_components.positionArray[entityID];
auto mesh = g_components.meshArray[entityID];

This sounds pretty the first time you see ECS implemented like this because component access can't get much faster than this. Problems with this approach (and why people use others):

  • entities that only have component 1, but not 2 3 4 and 5 still use memory for 2 3 4 and 5
  • Systems that iterate through all components will have to skip some NULL/nullptr components that are unused
  • Most engines do need some kind of data driven way to define and use components. With the added complication of tracking component type IDs, this also means component access becomes at least two array accesses, something like:

auto position = g_components[componentID][entityID]

So people do stuff like storing components in an associative manner instead, which is theoretically much slower in terms of how you access these components.

My proposal for a different solution: Create code that generates these global array structures for each used permutation of entities and their used component types. So there's not just a single global component collection, and instead multiple, each for one permutation of components used(not actual C++ syntax):


g_positionMeshComponents = {
vector<vec3> positions;
vector<mesh> meshes;
}

g_positionWeaponInventoryComponents = {
vector<vec3> positions;
vector<weapon> weapons;
vector<inventory> inventories;
}

etc. So when you create an entity that has a position and mesh component (and nothing else), you will know to access the data from that particular global variable.

Of course, in practice these global collections wouldn't be stored in static variables. You can't create these statically, because the number of possible permutations is too large if you have any realistic amount of component types. So you need to only keep those permutations around that are actually used. So instead you could create a key from the component types an entity uses (for example: bit mask where each component type is 1 bit in the mask, or a hash with low chance of collision...) and then access components like


auto componentCollection = g_components[key];
auto position = componentCollection[typeID][entityID];

or something like that. The key only needs to be updated when you add or remove components to / from an entity. This operation would be exceedingly rare even in complex games, at least in my experience. As such you can either store a pointer to the relevant component container in your entity struct, or work the key into the first 32 bits of the entity ID, or whatever...

Now you have a system where everything is stored nicely in a linear fashion, and only uses the memory it actually requires, plus the component access is really fast. It's still triple indirection, but it should come ahead easily of even highly optimized associate solutions.

But now there's one bigger problem: What about systems that would really just like to iterate through all components of a particular type and do stuff? The components are now spread over several arrays. My proposal is to write these systems separately, such that they have separate component storage and the actual component types used by entities are just wrappers used for access to these components, and they are kept in sync at some point in the code.

You need to do this anyways... when you use third party systems like physics engines and whatnot that have their own ideas about what is optimal for data storage and management...

Another problem: In this lookup


auto componentCollection = g_components[key];
auto position = componentCollection[typeID][entityID];

the componentCollection might only have two component types, but typeID can be pretty big (e.g. if every component type has just an incremented ID). Since you want to avoid associative lookup here, you would need to take the overhead and make each array in those componentCollections large enough to contain all component types... I'm not sure how bad this would be in terms of cache misses if a component lies in the middle of the array and the pointers to other type arrays right next to it are unused. Is there a better strategy?

What do you think about it?

This topic is closed to new replies.

Advertisement