How to resolve Entity - Component association in an ECS

Started by
17 comments, last by Oberon_Command 6 years, 11 months ago

After reading countless articles around the net (from RandyGaul, T-Machine, articles or topics here in gamedev.net etc) i find them almost always lacking the description on how to efficiently resolve the issue of associating entity with components and vice versa (especially).

Before getting into details i want to clarify that i'm doing all of these researches for a knowledge reason, not exactly because of a specific pratical need. I'm not doing any AAA, nor AA, but not even single A ( :P) games (just a for fun and study one).

What i'm trying to implement and research is the most data oriented way of dealing with components/entities/systems

Now most articles suggests that systems should own the data/components and store them contiguosly when possible. Though 9 out of 10 the system described somehow need only one component type and or need to loop on each component to do work.

I find this to be a bit unrealistic, and most often then not system need multiple component types and only half of the requested component types are actually stored in the current system.

Moreover not every component owned by the system have a/the same counterpart (i.e. in a multiplayer game, client side, one can have entities with PositionComponent and VelocityComponent, so something not simulated by the server, and then other entities with PositionComponent and NetworkComponent, where the second stores the remote position of the entity; both of these combinations would need to be processed by a MovementSystem lets say). And again it's not said that it is wise to try to process all the existing PositionComponents if only few of their NetworkComponent got updated... but this would either require a dreaded if/branch or an (expensive?) sort of the NetworkComponents, so that all the modified ones are near together.

So in a situation like this especially where you have a NetworkComponent that is owned and modified by a NetworkSystem, how do one then associate the PositionComponent stored inside the MovementSystem (lets say) in some position X with the NetworkComponent stored in the NetworkSystem in some position Y, with X and Y not known in advance?

I would think that the entity id should play some role here (or THE role), but again, i fail to see on how to do it efficiently.

I know i can actually have an Entity class rather then only and ID, have handles to the components the entity is made of, though then i would have to store the entity id on each component too... and then do several lookups to first retrieve the entity and then the NetworkComponent (if i'm in the MovementSystem, processing PositionComponents), but doesn't this kills the cache?

Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.

Yet another way i've seen is to make some assumptions on what group of components might be needed together and so store them together, but this seems very inflexible to me, especially when thinking of creating entities with "custom" groups of components at runtime.

Other ways are simply.. avoiding the situation designing either systems that require only one component and somehow receive data to apply to those components as input, already magically sorted in the right order so that each one is applied to right component.

Or yet again having systems that work on components which always have a/the same counterpart.

I understand that in several cases the required indirections to get another component aren't that bad because the lookup doesn't happen every frame.. though i'm concerned when this happens and it doesn't make sense to rearrange systems in another way.

TLRD: I'm trying to learn and research the proposed most efficient ECS, which follows DoD. It's all fine and dandy but i see holes in usability/flexibility (and performance) anyway when not having the ideal situation where you have systems dealing with one component, but actually multiple components not owned by the system.

Advertisement

Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.
You can implement a sparse array that stores ranges of components. It's way heavier for storing/removal (you have to search into the range array, shift elements around), but access should be pretty fast.

I got one implemented in my Java ECS framework: https://github.com/dustContributor/dustArtemis/blob/entity-filter/src/com/artemis/CompactComponentHandler.java

Thats a tightly packed sparse array. If you allocate components for entities 50, 51, 52, and 60, 61, 62, it will store all 6 components tightly in the same array, and store two used ranges from 50 to 53, and from 60 to 63. Then for access it searches the matching range for the entity, offsets by the id and reads. No space wasted.

That particular implementation pretty much aint worth it, because I can't even have contiguous components in memory in Java, and it destroys performance if I use it for all component types, but it might be doable with actual structs and with a more compact implementation, or just use it for the most wasteful components (ie, those rarely used and with wide gaps of entity IDs in between).

You can implement some component manager that allocates components contiguously, and shell out indices into it instead. So you'd have a tightly packed array of components, and a not-so tightly packed array of indices. To access a component you'd do component[indices[entityId]].

Indices can be int32 or int16 to better use each memory trip (64k components ought to be enough for everybody right?), and the CPU can probably figure out you're reading the component array sequentially thus prefetching the reads as you go (which is what you want).

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Now most articles suggests that systems should own the data/components and store them contiguosly when possible. Though 9 out of 10 the system described somehow need only one component type and or need to loop on each component to do work.

I find this to be a bit unrealistic, and most often then not system need multiple component types and only half of the requested component types are actually stored in the current system.


Why some components are stored in the system and others aren't? (well that probably has to do with the different meanings that ECS has for different people)

But, regarding the use of contiguous memory:

With a scene graph, components inside game objects, it's fairly possible: objects just need to be added in the flat array *after* the parent. When parsing the game objects, the updates will occur in the correct order, parents first, children later.

With a scene graph, components inside a flat array and game objects as IDs: components need to be added in the flat array close to a component with the same game object ID.

No scene graph: both methods are fairly easy without a scene graph, since there's no need to worry about the order of update.

To be honest, I don't see much difference in putting the logic of a component inside a method (in the component) and putting the logic in a system. If the logic was in the game objects, retrieving and modifying the components according with their specific needs, then I would see a difference.

So in a situation like this especially where you have a NetworkComponent that is owned and modified by a NetworkSystem, how do one then associate the PositionComponent stored inside the MovementSystem (lets say) in some position X with the NetworkComponent stored in the NetworkSystem in some position Y, with X and Y not known in advance?


The NetworkComponent should contain information about what variable (in this component or sibling component) the update will modify.

So if you want your NetworkComponent to change the variable "X" in the sibling TransformComponent, you could have a table containing the information of what to modify, maybe a list structures:


/* pseudo code */
structure {
    Type VariableType; /* the type of the variable we will update */
    String IncomingVariableName; /* "X" is the variable we will parse in the incoming data */
    String PathToVariableToUpdate; /* that would be "./TransformComponent/X" */
}
Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.


If *I* was going to make a ECS, I would make that every game object has every component, but some components are just disabled. That would guarantee that the memory is contiguous, and you would have to parse everything just once per frame (no excess of lookups).

Anyways, ECS (whatever definition of it) isn't a magic bullet, it has pitfalls, too. And sometimes doesn't solve real problems (write less code by reusing components, but at the same time write more code by writing more lookups and adding conditions whether the game object has a sibling of specific type or other?). On the other hand, some people argue that it makes easier to make composition with it.

I didn't have time to read all the replies but I went the way of allowing my entity to be a small struct.


struct Entity
{
    std::uint64_t m_entity_name;
    std::bitset<NumberOfComponents> m_component_list;

    bool m_is_alive = true;
}

This way you can have multiple components, they just know the ID (and it can be 16/32/64bit whatever you need). I hash the string name, and in debug mode keep a list of hash->std::string for debug reasons. Then you know what each entity has component wise. My systems are updated in order, work is then multi-threaded as it should not depend on any other of the same data. That way it can get the info from previous systems that it depends upon.

If it interests you at all, I can write up a larger reply. I'll revisit this one later tonight or this weekend.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

The most common solution I've seen for component lookup by entity is a hash-table with entity ID as the key and component index as the value. This level of indirection means that you can defragment (fill holes in the contiguity), sort into arbitrary orders, etc...
On one game that I worked on, we realised that we only ever had about 100 entities per level, so instead of a traditional hash-map, we used an array of 128 bytes (two cache lines!) which were indexed by entity ID and stored component index.

TLRD: I'm trying to learn and research the proposed most efficient ECS, which follows DoD.

DoD says that you should view your game as a series of data transformation (input->process->output) and then restructure the data layouts and the code to minimise the amount of work that the machine actually has to do.
ECS says you should be a generic framework that will dictate the data layout and corresponding transformation pipeline for your whole game. Moreover it dictates that your pipeline be built around mutable records/objects.

You can use the common optimization mantras of DoD (think about how the machine actually works; cache is important) to optimise an ECS framework, but the two philosophies are fundamentally at odds with each other.

A DoD ECS would allow every entity, component and system to break the rules of the framework in order to reach their optimal design, instead of forcing a general design across the board... Which means you'd no longer be building an ECS framework but more of a collection of utilities that lets you perform ecs style tasks in situations where it is an optimal solution.

My compromise approach would be something like this, much hated by some of the ECS purists:

  • Entities are classes, containing weak references to their components
  • Components contain data and methods that act on the data
  • Components are created by a factory method, which allocates them in a contiguous memory data structure like a std::vector

This way, if I need to have one component interacting with another via a Entity->GetComponentOfType<Whatever>() method, I can. And for the few areas where it makes sense to iterate quickly through homogeneous components without referring to other component types, I can do that as well.

As soon as you want to be able to use different permutations of multiple components to working together to perform single functions, you're pretty much stuffed from the extreme "everything must be contiguous to help the cache" point of view, so I prefer to just treat components as subobjects and arrange them contiguously as an optimisation where possible.

You can implement a sparse array that stores ranges of components. It's way heavier for storing/removal (you have to search into the range array, shift elements around), but access should be pretty fast.
I got one implemented in my Java ECS framework: https://github.com/dustContributor/dustArtemis/blob/entity-filter/src/com/artemis/CompactComponentHandler.java
Thats a tightly packed sparse array. If you allocate components for entities 50, 51, 52, and 60, 61, 62, it will store all 6 components tightly in the same array, and store two used ranges from 50 to 53, and from 60 to 63. Then for access it searches the matching range for the entity, offsets by the id and reads. No space wasted.

Thanks i will check it!
A couple of comments/questions though, if i understood it right, in the array you have components of different types one after the other, grouped by entityid?
Wouldn't that be an issue for the cache, since the Systems, dealing only with some types, would end up loading components that shouldn't be processed?

Why some components are stored in the system and others aren't? (well that probably has to do with the different meanings that ECS has for different people)

I meant a different thing: components are all stored in systems, but when, lets say, updating the MovementSystem, NetworkComponents are not stored there, but in the NetworkSystem, so they have to somehow be passed to the NetworkSystem, in an way/order that matches the PositionComponent counterpart.

But, regarding the use of contiguous memory:
With a scene graph, components inside game objects, it's fairly possible: objects just need to be added in the flat array *after* the parent. When parsing the game objects, the updates will occur in the correct order, parents first, children later.
With a scene graph, components inside a flat array and game objects as IDs: components need to be added in the flat array close to a component with the same game object ID.
No scene graph: both methods are fairly easy without a scene graph, since there's no need to worry about the order of update.
To be honest, I don't see much difference in putting the logic of a component inside a method (in the component) and putting the logic in a system. If the logic was in the game objects, retrieving and modifying the components according with their specific needs, then I would see a difference.

But, if i understood it right, this go "against" the Systems part if an ECS.
The reason of having components separated by type in different containers and stored in systems, plus the logic be separated from data, is to avoid spaghetti code.
Then also it's a matter of performance: components would need to access different parts of memory (stored in various components), which are scattered around... while having systems storing the components, they can process them in bulk.
And it's also then easier to make the component processing concurrent.

The NetworkComponent should contain information about what variable (in this component or sibling component) the update will modify.

So if you want your NetworkComponent to change the variable "X" in the sibling TransformComponent, you could have a table containing the information of what to modify, maybe a list structures:

/* pseudo code */
structure {
    Type VariableType; /* the type of the variable we will update */
    String IncomingVariableName; /* "X" is the variable we will parse in the incoming data */
    String PathToVariableToUpdate; /* that would be "./TransformComponent/X" */
}

I'm not sure what you mean here, but i might not have explained myself. With X and Y, i wasn't referring to PositionComponent coordinates, or even some generic field of a struct.. i meant indexes of the components inside their array, stored into their system.

If *I* was going to make a ECS, I would make that every game object has every component, but some components are just disabled. That would guarantee that the memory is contiguous, and you would have to parse everything just once per frame (no excess of lookups).

That would guarantee that components of different types are contiguous, but the update pattern of an ECS is to update components of the same type all together, so with your layout you would have to jump around in memory. The ECS idea is to loop components not entities.

Anyways, ECS (whatever definition of it) isn't a magic bullet, it has pitfalls, too. And sometimes doesn't solve real problems (write less code by reusing components, but at the same time write more code by writing more lookups and adding conditions whether the game object has a sibling of specific type or other?). On the other hand, some people argue that it makes easier to make composition with it.

And in fact my question is also half a critique to try to uncover the pitfalls, or the easy optimizations, where thinking about how to structure the code and data isn't a premature optimization.


I didn't have time to read all the replies but I went the way of allowing my entity to be a small struct.

struct Entity
{
    std::uint64_t m_entity_name;
    std::bitset<NumberOfComponents> m_component_list;

    bool m_is_alive = true;
}
This way you can have multiple components, they just know the ID (and it can be 16/32/64bit whatever you need). I hash the string name, and in debug mode keep a list of hash->std::string for debug reasons. Then you know what each entity has component wise. My systems are updated in order, work is then multi-threaded as it should not depend on any other of the same data. That way it can get the info from previous systems that it depends upon.
If it interests you at all, I can write up a larger reply. I'll revisit this one later tonight or this weekend.

I'm not sure i follow you here; I wasn't particularly worried about how to associate entities and components on a debug or editor point of view, but how to solve the needs of algorithms/logic in the systems to work on multiple component types, in a way that cannot be independent from their association with the entity.


The most common solution I've seen for component lookup by entity is a hash-table with entity ID as the key and component index as the value. This level of indirection means that you can defragment (fill holes in the contiguity), sort into arbitrary orders, etc...

So you mean that for each component type there's a hash-table entityid->component index?
Wouldn't this again blow up the cache, given that keys and values in a hash-table are not that contiguous?
Or you actually meant something like
std::unordered_map<int, std::vector<Component>>
where the int key is the entity id and Component is the base class of all Components?

DoD says that you should view your game as a series of data transformation (input->process->output) and then restructure the data layouts and the code to minimise the amount of work that the machine actually has to do.
ECS says you should be a generic framework that will dictate the data layout and corresponding transformation pipeline for your whole game. Moreover it dictates that your pipeline be built around mutable records/objects.

You can use the common optimization mantras of DoD (think about how the machine actually works; cache is important) to optimise an ECS framework, but the two philosophies are fundamentally at odds with each other.

A DoD ECS would allow every entity, component and system to break the rules of the framework in order to reach their optimal design, instead of forcing a general design across the board... Which means you'd no longer be building an ECS framework but more of a collection of utilities that lets you perform ecs style tasks in situations where it is an optimal solution.

So you mean DOD and ECS are kind of working against each other?
What, i think, i have understood is that while ECS per se doesn't dictate that one should use DOD (i've forgot a capitalized O earlier :P), it makes it simpler to use it, because separating code and data and then putting data all together in systems kind of follows what DOD "asks".
Moreveover again, putting data in systems, so that it's contiguous etc, helps the cache.

What i'm trying to understand now is how much of DOD can be used in an ECS without hanging up myself with premature optimizations and architecture rigidity.
Sure the architecture won't be rigid as a normal inheritance based game, but deciding how to resolve the lookups means changing data structures and also the actual code that deals with the lookups.
Probably it cannot be avoided, but again i'm trying to understand if there's some "gotchas" i'm missing here or it really ends up being a case by case decision.

My compromise approach would be something like this, much hated by some of the ECS purists:


    • Entities are classes, containing weak references to their components
    • Components contain data and methods that act on the data
    • [...]

    I see, while i wouldn't put logic in components (unless they are created by some scripting language, hello data drive design!), it really seems to me too that one cannot avoid to have that kind of association of entities and components you describe.
    I mean again i've seen articles solving that, as i've said earlier, storing components in a way that's easier and direct to deduce the association, but it seems that they sacrifice a lot of flexibility and runtime properties of an ECS system.

    I think the Artemis system has quite elegant ways of handling cross-component dependencies via placing an 'Aspect' abstraction between Systems and Components. But it doesn't remotely give you any of the data-oriented caching benefits. At that point, you have to ask what benefit it's giving you as compared to a simpler Unity or Unreal style component, i.e. as a simple encapsulated sub-object.

    Most of the aversion to putting behaviour in components stems from several people incorrectly asserting that components are somehow 'against' object-orientation, and that therefore we need to ditch all object oriented features to truly embrace components. In fact components have been a key part of object-oriented design for decades. The Design Patterns book was saying "favor 'object composition' over 'class inheritance'" way back in 1995.

    So you mean that for each component type there's a hash-table entityid->component index? Wouldn't this again blow up the cache, given that keys and values in a hash-table are not that contiguous?

    Being nice to cache doesn't always mean "make all the things contiguous". In the example implementation I mentioned, the entire hash table was two cache lines. Once they're fetched once, they likely remain in cache for the entire system update. The same can be true for larger hash tables too - sometimes you may be able to pre-fetch the entire hash map...

    So you mean DOD and ECS are kind of working against each other?

    Extremely. They're opposite philosophies -- one says "look at the actual problems that need to be solved and craft minimal solutions to them" and the other says "lets make a generic framework that will solve every problem, and then solve every problem in the one way that makes it fit into our framework".

    Moreveover again, putting data in systems, so that it's contiguous etc, helps the cache.

    As a general rule of thumb having contiguous data helps the cache. That's not true for every situation though. Sometimes it may be better to allocate a clump of three different components together in a mixed allocation. Other times it may be better to have an array of a single type of component. Sometimes it may make sense to have thirty different arrays of the same kind of component. DOD tells you to pick individual solutions to each problem.

    Sure the architecture won't be rigid as a normal inheritance based game

    Wait wait wait. Deep inheritance hierarchies are not normal. A core rule of OOP is that composition (i.e. the use of components) is preferred over inheritance.
    If you've been reading the kinds of blogs that say "OOP is bad because deep inheritance trees have a bunch of problems, so here's ECS and it solves everything!"... then I'm sorry but these are straw-man arguments made by people who suck at OOP and instead of choosing to learn how to do OOP properly, they've instead given up and invented a new, completely unrecognised/informal pattern as a crutch for their ego issues. They're basically saying "OOP is bad because I'm bad at it, so I'm going to reinvent an ad-hoc version of the relational model and claim it as a new invention". Those blogs are bad.

    i'm trying to understand if there's some "gotchas" i'm missing here or it really ends up being a case by case decision

    If you drink the ECS cool aid, then it doesn't matter, just implement any solution and use it everywhere. If you drink the DOD cool aid, then everything is a case by case decision and you shouldn't bother with a component framework at all. If you stick with OOP, then everything is a component already, you have a multitude of choices for how they're connected to each other and you don't need a framework.

    This topic is closed to new replies.

    Advertisement