Sign in to follow this  
Smjert

How to resolve Entity - Component association in an ECS

Recommended Posts

After reading countless articles around the net (from RandyGaul, T-Machine, articles or topics here in gamedev.net etc) i find them almost always lacking the description on how to efficiently resolve the issue of associating entity with components and vice versa (especially).

Before getting into details i want to clarify that i'm doing all of these researches for a knowledge reason, not exactly because of a specific pratical need. I'm not doing any AAA, nor AA, but not even single A ( :P) games (just a for fun and study one). 

What i'm trying to implement and research is the most data oriented way of dealing with components/entities/systems

Now most articles suggests that systems should own the data/components and store them contiguosly when possible. Though 9 out of 10 the system described somehow need only one component type and or need to loop on each component to do work.

I find this to be a bit unrealistic, and most often then not system need multiple component types and only half of the requested component types are actually stored in the current system.

Moreover not every component owned by the system have a/the same counterpart (i.e. in a multiplayer game, client side, one can have entities with PositionComponent and VelocityComponent, so something not simulated by the server, and then other entities with PositionComponent and NetworkComponent, where the second stores the remote position of the entity; both of these combinations would need to be processed by a MovementSystem lets say). And again it's not said that it is wise to try to process all the existing PositionComponents if only few of their NetworkComponent got updated... but this would either require a dreaded if/branch or an (expensive?) sort of the NetworkComponents, so that all the modified ones are near together.

So in a situation like this especially where you have a NetworkComponent that is owned and modified by a NetworkSystem, how do one then associate the PositionComponent stored inside the MovementSystem (lets say) in some position X with the NetworkComponent stored in the NetworkSystem in some position Y, with X and Y not known in advance?

I would think that the entity id should play some role here (or THE role), but again, i fail to see on how to do it efficiently.

I know i can actually have an Entity class rather then only and ID, have handles to the components the entity is made of, though then i would have to store the entity id on each component too... and then do several lookups to first retrieve the entity and then the NetworkComponent (if i'm in the MovementSystem, processing PositionComponents), but doesn't this kills the cache?

Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.

Yet another way i've seen is to make some assumptions on what group of components might be needed together and so store them together, but this seems very inflexible to me, especially when thinking of creating entities with "custom" groups of components at runtime.

Other ways are simply.. avoiding the situation designing either systems that require only one component and somehow receive data to apply to those components as input, already magically sorted in the right order so that each one is applied to right component.

Or yet again having systems that work on components which always have a/the same counterpart.

I understand that in several cases the required indirections to get another component aren't that bad because the lookup doesn't happen every frame.. though i'm concerned when this happens and it doesn't make sense to rearrange systems in another way.

 

TLRD: I'm trying to learn and research the proposed most efficient ECS, which follows DoD. It's all fine and dandy but i see holes in usability/flexibility (and performance) anyway when not having the ideal situation where you have systems dealing with one component, but actually multiple components not owned by the system.

Edited by Smjert

Share this post


Link to post
Share on other sites

Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.
You can implement a sparse array that stores ranges of components. It's way heavier for storing/removal (you have to search into the range array, shift elements around), but access should be pretty fast. 

I got one implemented in my Java ECS framework: https://github.com/dustContributor/dustArtemis/blob/entity-filter/src/com/artemis/CompactComponentHandler.java

Thats a tightly packed sparse array. If you allocate components for entities 50, 51, 52, and 60, 61, 62, it will store all 6 components tightly in the same array, and store two used ranges from 50 to 53, and from 60 to 63. Then for access it searches the matching range for the entity, offsets by the id and reads.  No space wasted.

That particular implementation pretty much aint worth it, because I can't even have contiguous components in memory in Java, and it destroys performance if I use it for all component types, but it might be doable with actual structs and with a more compact implementation, or just use it for the most wasteful components (ie, those rarely used and with wide gaps of entity IDs in between).

You can implement some component manager that allocates components contiguously, and shell out indices into it instead. So you'd have a tightly packed array of components, and a not-so tightly packed array of indices. To access a component you'd do component[indices[entityId]].

Indices can be int32 or int16 to better use each memory trip (64k components ought to be enough for everybody right?), and the CPU can probably figure out you're reading the component array sequentially thus prefetching the reads as you go (which is what you want).

Share this post


Link to post
Share on other sites
Now most articles suggests that systems should own the data/components and store them contiguosly when possible. Though 9 out of 10 the system described somehow need only one component type and or need to loop on each component to do work.

I find this to be a bit unrealistic, and most often then not system need multiple component types and only half of the requested component types are actually stored in the current system.


Why some components are stored in the system and others aren't? (well that probably has to do with the different meanings that ECS has for different people)

But, regarding the use of contiguous memory:

With a scene graph, components inside game objects, it's fairly possible: objects just need to be added in the flat array *after* the parent. When parsing the game objects, the updates will occur in the correct order, parents first, children later.

With a scene graph, components inside a flat array and game objects as IDs: components need to be added in the flat array close to a component with the same game object ID.

No scene graph: both methods are fairly easy without a scene graph, since there's no need to worry about the order of update.

To be honest, I don't see much difference in putting the logic of a component inside a method (in the component) and putting the logic in a system. If the logic was in the game objects, retrieving and modifying the components according with their specific needs, then I would see a difference.
 

 

So in a situation like this especially where you have a NetworkComponent that is owned and modified by a NetworkSystem, how do one then associate the PositionComponent stored inside the MovementSystem (lets say) in some position X with the NetworkComponent stored in the NetworkSystem in some position Y, with X and Y not known in advance?


The NetworkComponent should contain information about what variable (in this component or sibling component) the update will modify.

So if you want your NetworkComponent to change the variable "X" in the sibling TransformComponent, you could have a table containing the information of what to modify, maybe a list structures:

/* pseudo code */
structure {
    Type VariableType; /* the type of the variable we will update */
    String IncomingVariableName; /* "X" is the variable we will parse in the incoming data */
    String PathToVariableToUpdate; /* that would be "./TransformComponent/X" */
}
Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.


If *I* was going to make a ECS, I would make that every game object has every component, but some components are just disabled. That would guarantee that the memory is contiguous, and you would have to parse everything just once per frame (no excess of lookups).
 

Anyways, ECS (whatever definition of it) isn't a magic bullet, it has pitfalls, too. And sometimes doesn't solve real problems (write less code by reusing components, but at the same time write more code by writing more lookups and adding conditions whether the game object has a sibling of specific type or other?). On the other hand, some people argue that it makes easier to make composition with it.

Edited by felipefsdev

Share this post


Link to post
Share on other sites

I didn't have time to read all the replies but I went the way of allowing my entity to be a small struct.

struct Entity
{
    std::uint64_t m_entity_name;
    std::bitset<NumberOfComponents> m_component_list;

    bool m_is_alive = true;
}

This way you can have multiple components, they just know the ID (and it can be 16/32/64bit whatever you need).  I hash the string name, and in debug mode keep a list of hash->std::string for debug reasons.  Then you know what each entity has component wise.  My systems are updated in order, work is then multi-threaded as it should not depend on any other of the same data.  That way it can get the info from previous systems that it depends upon.

If it interests you at all, I can write up a larger reply.  I'll revisit this one later tonight or this weekend.

Share this post


Link to post
Share on other sites
The most common solution I've seen for component lookup by entity is a hash-table with entity ID as the key and component index as the value. This level of indirection means that you can defragment (fill holes in the contiguity), sort into arbitrary orders, etc...
On one game that I worked on, we realised that we only ever had about 100 entities per level, so instead of a traditional hash-map, we used an array of 128 bytes (two cache lines!) which were indexed by entity ID and stored component index.

TLRD: I'm trying to learn and research the proposed most efficient ECS, which follows DoD.

DoD says that you should view your game as a series of data transformation (input->process->output) and then restructure the data layouts and the code to minimise the amount of work that the machine actually has to do.
ECS says you should be a generic framework that will dictate the data layout and corresponding transformation pipeline for your whole game. Moreover it dictates that your pipeline be built around mutable records/objects.

You can use the common optimization mantras of DoD (think about how the machine actually works; cache is important) to optimise an ECS framework, but the two philosophies are fundamentally at odds with each other.

A DoD ECS would allow every entity, component and system to break the rules of the framework in order to reach their optimal design, instead of forcing a general design across the board... Which means you'd no longer be building an ECS framework but more of a collection of utilities that lets you perform ecs style tasks in situations where it is an optimal solution.

Share this post


Link to post
Share on other sites

My compromise approach would be something like this, much hated by some of the ECS purists:

  • Entities are classes, containing weak references to their components
  • Components contain data and methods that act on the data
  • Components are created by a factory method, which allocates them in a contiguous memory data structure like a std::vector

This way, if I need to have one component interacting with another via a Entity->GetComponentOfType<Whatever>() method, I can. And for the few areas where it makes sense to iterate quickly through homogeneous components without referring to other component types, I can do that as well.

As soon as you want to be able to use different permutations of multiple components to working together to perform single functions, you're pretty much stuffed from the extreme "everything must be contiguous to help the cache" point of view, so I prefer to just treat components as subobjects and arrange them contiguously as an optimisation where possible.

Share this post


Link to post
Share on other sites

You can implement a sparse array that stores ranges of components. It's way heavier for storing/removal (you have to search into the range array, shift elements around), but access should be pretty fast. 
I got one implemented in my Java ECS framework: https://github.com/dustContributor/dustArtemis/blob/entity-filter/src/com/artemis/CompactComponentHandler.java
Thats a tightly packed sparse array. If you allocate components for entities 50, 51, 52, and 60, 61, 62, it will store all 6 components tightly in the same array, and store two used ranges from 50 to 53, and from 60 to 63. Then for access it searches the matching range for the entity, offsets by the id and reads.  No space wasted.

Thanks i will check it!
A couple of comments/questions though, if i understood it right, in the array you have components of different types one after the other, grouped by entityid?
Wouldn't that be an issue for the cache, since the Systems, dealing only with some types, would end up loading components that shouldn't be processed?
 

Why some components are stored in the system and others aren't? (well that probably has to do with the different meanings that ECS has for different people)

I meant a different thing: components are all stored in systems, but when, lets say, updating the MovementSystem, NetworkComponents are not stored there, but in the NetworkSystem, so they have to somehow be passed to the NetworkSystem, in an way/order that matches the PositionComponent counterpart.
 

But, regarding the use of contiguous memory:
With a scene graph, components inside game objects, it's fairly possible: objects just need to be added in the flat array *after* the parent. When parsing the game objects, the updates will occur in the correct order, parents first, children later.
With a scene graph, components inside a flat array and game objects as IDs: components need to be added in the flat array close to a component with the same game object ID.
No scene graph: both methods are fairly easy without a scene graph, since there's no need to worry about the order of update.
To be honest, I don't see much difference in putting the logic of a component inside a method (in the component) and putting the logic in a system. If the logic was in the game objects, retrieving and modifying the components according with their specific needs, then I would see a difference.

But, if i understood it right, this go "against" the Systems part if an ECS.
The reason of having components separated by type in different containers and stored in systems, plus the logic be separated from data, is to avoid spaghetti code.
Then also it's a matter of performance: components would need to access different parts of memory (stored in various components), which are scattered around... while having systems storing the components, they can process them in bulk.
And it's also then easier to make the component processing concurrent.
 

The NetworkComponent should contain information about what variable (in this component or sibling component) the update will modify.

So if you want your NetworkComponent to change the variable "X" in the sibling TransformComponent, you could have a table containing the information of what to modify, maybe a list structures:

/* pseudo code */
structure {
    Type VariableType; /* the type of the variable we will update */
    String IncomingVariableName; /* "X" is the variable we will parse in the incoming data */
    String PathToVariableToUpdate; /* that would be "./TransformComponent/X" */
}

I'm not sure what you mean here, but i might not have explained myself. With X and Y, i wasn't referring to PositionComponent coordinates, or even some generic field of a struct.. i meant indexes of the components inside their array, stored into their system.
 

If *I* was going to make a ECS, I would make that every game object has every component, but some components are just disabled. That would guarantee that the memory is contiguous, and you would have to parse everything just once per frame (no excess of lookups).

That would guarantee that components of different types are contiguous, but the update pattern of an ECS is to update components of the same type all together, so with your layout you would have to jump around in memory. The ECS idea is to loop components not entities.
 

Anyways, ECS (whatever definition of it) isn't a magic bullet, it has pitfalls, too. And sometimes doesn't solve real problems (write less code by reusing components, but at the same time write more code by writing more lookups and adding conditions whether the game object has a sibling of specific type or other?). On the other hand, some people argue that it makes easier to make composition with it.

And in fact my question is also half a critique to try to uncover the pitfalls, or the easy optimizations, where thinking about how to structure the code and data isn't a premature optimization.

 

I didn't have time to read all the replies but I went the way of allowing my entity to be a small struct.

struct Entity
{
    std::uint64_t m_entity_name;
    std::bitset<NumberOfComponents> m_component_list;

    bool m_is_alive = true;
}
This way you can have multiple components, they just know the ID (and it can be 16/32/64bit whatever you need).  I hash the string name, and in debug mode keep a list of hash->std::string for debug reasons.  Then you know what each entity has component wise.  My systems are updated in order, work is then multi-threaded as it should not depend on any other of the same data.  That way it can get the info from previous systems that it depends upon.
If it interests you at all, I can write up a larger reply.  I'll revisit this one later tonight or this weekend.

I'm not sure i follow you here; I wasn't particularly worried about how to associate entities and components on a debug or editor point of view, but how to solve the needs of algorithms/logic in the systems to work on multiple component types, in a way that cannot be independent from their association with the entity.

 

The most common solution I've seen for component lookup by entity is a hash-table with entity ID as the key and component index as the value. This level of indirection means that you can defragment (fill holes in the contiguity), sort into arbitrary orders, etc...

So you mean that for each component type there's a hash-table entityid->component index?
Wouldn't this again blow up the cache, given that keys and values in a hash-table are not that contiguous?
Or you actually meant something like
std::unordered_map<int, std::vector<Component>>
where the int key is the entity id and Component is the base class of all Components?
 

DoD says that you should view your game as a series of data transformation (input->process->output) and then restructure the data layouts and the code to minimise the amount of work that the machine actually has to do.
ECS says you should be a generic framework that will dictate the data layout and corresponding transformation pipeline for your whole game. Moreover it dictates that your pipeline be built around mutable records/objects.

You can use the common optimization mantras of DoD (think about how the machine actually works; cache is important) to optimise an ECS framework, but the two philosophies are fundamentally at odds with each other.

A DoD ECS would allow every entity, component and system to break the rules of the framework in order to reach their optimal design, instead of forcing a general design across the board... Which means you'd no longer be building an ECS framework but more of a collection of utilities that lets you perform ecs style tasks in situations where it is an optimal solution.

So you mean DOD and ECS are kind of working against each other?
What, i think, i have understood is that while ECS per se doesn't dictate that one should use DOD (i've forgot a capitalized O earlier :P), it makes it simpler to use it, because separating code and data and then putting data all together in systems kind of follows what DOD "asks".
Moreveover again, putting data in systems, so that it's contiguous etc, helps the cache.

What i'm trying to understand now is how much of DOD can be used in an ECS without hanging up myself with premature optimizations and architecture rigidity.
Sure the architecture won't be rigid as a normal inheritance based game, but deciding how to resolve the lookups means changing data structures and also the actual code that deals with the lookups.
Probably it cannot be avoided, but again i'm trying to understand if there's some "gotchas" i'm missing here or it really ends up being a case by case decision.

Share this post


Link to post
Share on other sites

My compromise approach would be something like this, much hated by some of the ECS purists:
[list]
[list]

  • Entities are classes, containing weak references to their components
  • Components contain data and methods that act on the data
  • [...]
     
  • [/list]

I see, while i wouldn't put logic in components (unless they are created by some scripting language, hello data drive design!), it really seems to me too that one cannot avoid to have that kind of association of entities and components you describe.
I mean again i've seen articles solving that, as i've said earlier, storing components in a way that's easier and direct to deduce the association, but it seems that they sacrifice a lot of flexibility and runtime properties of an ECS system. Edited by Smjert

Share this post


Link to post
Share on other sites

I think the Artemis system has quite elegant ways of handling cross-component dependencies via placing an 'Aspect' abstraction between Systems and Components. But it doesn't remotely give you any of the data-oriented caching benefits. At that point, you have to ask what benefit it's giving you as compared to a simpler Unity or Unreal style component, i.e. as a simple encapsulated sub-object.

Most of the aversion to putting behaviour in components stems from several people incorrectly asserting that components are somehow 'against' object-orientation, and that therefore we need to ditch all object oriented features to truly embrace components. In fact components have been a key part of object-oriented design for decades. The Design Patterns book was saying "favor 'object composition' over 'class inheritance'" way back in 1995.

Share this post


Link to post
Share on other sites

So you mean that for each component type there's a hash-table entityid->component index? Wouldn't this again blow up the cache, given that keys and values in a hash-table are not that contiguous?

Being nice to cache doesn't always mean "make all the things contiguous". In the example implementation I mentioned, the entire hash table was two cache lines. Once they're fetched once, they likely remain in cache for the entire system update. The same can be true for larger hash tables too - sometimes you may be able to pre-fetch the entire hash map...

So you mean DOD and ECS are kind of working against each other?

Extremely. They're opposite philosophies -- one says "look at the actual problems that need to be solved and craft minimal solutions to them" and the other says "lets make a generic framework that will solve every problem, and then solve every problem in the one way that makes it fit into our framework".

Moreveover again, putting data in systems, so that it's contiguous etc, helps the cache.

As a general rule of thumb having contiguous data helps the cache. That's not true for every situation though. Sometimes it may be better to allocate a clump of three different components together in a mixed allocation. Other times it may be better to have an array of a single type of component. Sometimes it may make sense to have thirty different arrays of the same kind of component. DOD tells you to pick individual solutions to each problem.

Sure the architecture won't be rigid as a normal inheritance based game

Wait wait wait. Deep inheritance hierarchies are not normal. A core rule of OOP is that composition (i.e. the use of components) is preferred over inheritance.
If you've been reading the kinds of blogs that say "OOP is bad because deep inheritance trees have a bunch of problems, so here's ECS and it solves everything!"... then I'm sorry but these are straw-man arguments made by people who suck at OOP and instead of choosing to learn how to do OOP properly, they've instead given up and invented a new, completely unrecognised/informal pattern as a crutch for their ego issues. They're basically saying "OOP is bad because I'm bad at it, so I'm going to reinvent an ad-hoc version of the relational model and claim it as a new invention". Those blogs are bad.

i'm trying to understand if there's some "gotchas" i'm missing here or it really ends up being a case by case decision

If you drink the ECS cool aid, then it doesn't matter, just implement any solution and use it everywhere. If you drink the DOD cool aid, then everything is a case by case decision and you shouldn't bother with a component framework at all. If you stick with OOP, then everything is a component already, you have a multitude of choices for how they're connected to each other and you don't need a framework.

Share this post


Link to post
Share on other sites

I think the Artemis system has quite elegant ways of handling cross-component dependencies via placing an 'Aspect' abstraction between Systems and Components. But it doesn't remotely give you any of the data-oriented caching benefits. At that point, you have to ask what benefit it's giving you as compared to a simpler Unity or Unreal style component, i.e. as a simple encapsulated sub-object.

Thanks for reminding me to check it again, more in depth.
 

Most of the aversion to putting behaviour in components stems from several people incorrectly asserting that components are somehow 'against' object-orientation, and that therefore we need to ditch all object oriented features to truly embrace components. In fact components have been a key part of object-oriented design for decades. The Design Patterns book was saying "favor 'object composition' over 'class inheritance'" way back in 1995.

Well yes, zealotry is never good and i've almost always think about OOP code not in term of deep inheritance hierarchies, or basically all the "bad" characteristics people give to it.
At the same time i recognize that sometimes in the pursue of giving better interfaces, this complicates refactoring for concurrency and giving priority to data flow.

 

Being nice to cache doesn't always mean "make all the things contiguous". In the example implementation I mentioned, the entire hash table was two cache lines. Once they're fetched once, they likely remain in cache for the entire system update. The same can be true for larger hash tables too - sometimes you may be able to pre-fetch the entire hash map...

Here i was thinking of a loop that needs to access to components of the same type, one after the other. It obviously all depend on the pattern access and data sizes.
 

Extremely. They're opposite philosophies -- one says "look at the actual problems that need to be solved and craft minimal solutions to them" and the other says "lets make a generic framework that will solve every problem, and then solve every problem in the one way that makes it fit into our framework".

And this surprise me, i supposed that the fact that ECS talks about systems which have the logic and components are only data was to kind of enforce DOD.
 

As a general rule of thumb having contiguous data helps the cache. That's not true for every situation though. [...]

As before i was still thinking of the same type of loop.
 

Wait wait wait. Deep inheritance hierarchies are not normal. A core rule of OOP is that composition (i.e. the use of components) is preferred over inheritance.
If you've been reading the kinds of blogs that say "OOP is bad because deep inheritance trees have a bunch of problems, so here's ECS and it solves everything!"... then I'm sorry but these are straw-man arguments made by people who suck at OOP and instead of choosing to learn how to do OOP properly, they've instead given up and invented a new, completely unrecognised/informal pattern as a crutch for their ego issues. They're basically saying "OOP is bad because I'm bad at it, so I'm going to reinvent an ad-hoc version of the relational model and claim it as a new invention". Those blogs are bad.

Well i find that to be pretty wrong too, as i've said with Kylotan.
What i meant with normal (probably poor choice of word) is what commonly (and probably wrongly) people referred to inheritance based games which have to be despised.
 

If you drink the ECS cool aid, then it doesn't matter, just implement any solution and use it everywhere. If you drink the DOD cool aid, then everything is a case by case decision and you shouldn't bother with a component framework at all. If you stick with OOP, then everything is a component already, you have a multitude of choices for how they're connected to each other and you don't need a framework.

Not trying to follow any specific religion, if it seems so is because i still want to evaluate them fully to understand what seems the best and the worst parts, keeping the worst ones limited to few areas, where possible.
And you all are helping with that :).

Share this post


Link to post
Share on other sites

The T-Machine style of "entity systems" (as he calls it) is not a very consistent design philosophy. In his first blog post on it, he says that components are for "speeding up the development time". Okay. But by post 5 it's off down the rabbit hole of how to store components in relational databases and only more recently did he jump on the data-oriented programming bandwagon and start worrying about how to lay things out in memory, ultimately admitting that none of these approaches are perfect and you probably don't need to worry about that.

From the other direction, some people have come at it entirely from the data-oriented side, but although the main body of this approach is all about homogeneous transformations on contiguous arrays of data, as soon as it mentions component-based entities for games you can see from the example there that they're back to "if this entity has X component, do Something(X)". They use sparse arrays and check for index existence instead of null pointers inside entity objects but the basic logic is the same, thus throwing away much of the other DoD advantages such as to avoid boolean conditionals or to perform little other than straight data transformations.

Since both of these viewpoints include a disdain for OOP, and both viewpoints prefer to think about the data first and the data owner second, it's easy to think that they're two aspects of the same thing, but really they are 2 separate tools that have a small overlap - and the area where they overlap is arguably too small to base your whole game project around it.

The fact is, to go right back to the original question, there are several efficient ways of locating components, each more or less suitable for a given program or subsystem, and each probably breaking some subjective 'rule' about how this 'should' be done:

  1. entity contains all components inline (possibly 'switched off' if necessary)
  2. entity holds pointer or reference to relevant components, which may be null
  3. entity holds index of component in some external vector or array
  4. system holds sparse array (e.g. std::unordered_map), mapping entity IDs to components
  5. system holds contiguous array of components, only ever accessing them sequentially
  6. system holds list of aspects, each of which maps an entity to a tuple of components (pointers or indices)

When it comes to pragmatic coding, in the general case of game logic code, I prefer option 2.

Edited by Kylotan

Share this post


Link to post
Share on other sites

From the other direction, some people have come at it entirely from the data-oriented side, but although the main body of this approach is all about homogeneous transformations on contiguous arrays of data, as soon as it mentions component-based entities for games you can see from the example there that they're back to "if this entity has X component, do Something(X)". They use sparse arrays and check for index existence instead of null pointers inside entity objects but the basic logic is the same, thus throwing away much of the other DoD advantages such as to avoid boolean conditionals or to perform little other than straight data transformations.


You wouldn't be doing "if entity has X component" in the most common operation on the components, though - you would be batch processing them sequentially. If you're not doing that, don't decompose the data this way. The point of using the sparse arrays is the batch processing, so if you don't need or can't meaningfully rework the data into a batch-processable form, don't bother doing it. The index checks aren't really any worse than your option 2, anyway - it's still a double indirection, just in a different form.

I would also point out that my first choice for a sparse array would NOT be std::unordered_map, but a flat vector with an index-to-index map. Eg.
 
// NOT this
std::unordered_map<int, Component> components;
Component* TryAndGetComponent(int index)
{
   auto componentIter = components.find(index);
   if (componentIter != std::end(components))
   {
       return &*componentIter;
   }
 
   return nullptr;
}

// but this - componentIndices is the size of the number of entities, componentData is the number of active components
// as you add and remove components, you'd update componentIndices so the same entity index always points at the right component
// this is still a double-indirection, but it's guaranteed O(1) rather than worst-case O(n) as with std::unordered_map, and you don't need hashes at all
// plus, all the components are contiguous in memory, so you can iterate over them in a uniform fashion!
// of course, you'd want to wrap this up in a SparseArray class... and maybe have a type-safe handle class...
std::vector<int> componentIndices;
std::vector<Component> componentData;
Component* TryAndGetComponent(int index)
{
   int actualIndex = componentIndices[index];
   if (actualIndex != -1)
   {
        return &componentData[actualIndex];
   }

   return nullptr;
}
But again, the whole point of DOD is not to be dogmatic and lock yourself into a single architecture for everything. ECS and DOD can coexist, but only if the ECS is the end result of the DOD. Don't do this if you don't get anything out of it. Edited by Oberon_Command

Share this post


Link to post
Share on other sites

I like to try to think of components as big pieces that can be compartmentalized. For a 2D game something like: rigid body, sprite, logic, input, character controller, and probably some other game-specific ones. Here's what each might do:

  • Sprite : reference animations or images, perform actual animating logic, store any bones/keyframes, wraps whatever underlying code necessary (maybe some DirectX or PNG loading as an example)
  • Rigid body : wraps underlying physics library, holds rigid body, colliders, deals with collision callbacks and filtering
  • Logic : stores scripts, swaps scripts around as necessary, holds whatever gameplay or logic related things as necessary, like structs or attributes needed to define itself
  • Input : relays or filters messages on input keystrokes, mouse clicks, sets up listeners and whatnot
  • Character controller : Pretty self explanatory. Character controllers are important and big enough to warrant their own component

It really is a Bad IdeaTM to try and break components down into smaller pieces. Breaking things down all the way to "velocity" and "position" components is way too far. There is not real benefit from such granularity, and in the end this forces the developer to highly optimize the component storage/association... Which is a waste of time for many developers. Really, the powerful thing about components is that they can be separated into different isolated services, each self-contained. This is really similar to the old "actor-model" concept, where actors contain private data, and can modify their own private data, and interact with other actors by sending messages. A message can be safely ignored by any actor at any time.

What more modern ECS try to do is to setup arrays of data-only components, and then loop over them to do processing with "systems". A system merely comprising (at least conceptually) of a for loop running over an array of data. Thus the messaging becomes indirect communication between systems as defined by the operating order on the components. System A comes before system B, and modifies the components. B then reads the components and sees the modifications, and so on to system C. Communication, or messaging, is thrusted into the system processing order. Personally I find this is super overkill for anything short of a large commercial project with multiple engineers.

Try to focus on defining large components that can be self-contained, and wrap underlying pre-existing libraries. Try to focus on how to setup a nice messaging system, something very straightforward to use and debug. Function calls are perfectly fine for implementing messaging. The idea here is to merely to decouple dependencies through an abstraction! A message should be used as a dependency decoupling mechanism. It will definitely take some practice and skill to figure out what should and should not be a message.

At this point it's suddenly not all that important how the components are tied together to form an entity. In my own personal code I do this:

struct Entity
{
	EntityID id;
	Entity* next;
	Entity* prev;

	virtual void Update( Entity* entity, float dt ) = 0;
	virtual void Draw( Entity* entity ) = 0;
};

struct Monster : public Entity
{
	RigidBody body;
	Sprite some_sprite;
	Sprite another_sprite;
	Logic logic;

	void Update( Entity* entity, float dt ) override;
	void Draw( Entity* entity ) override;
};

It is immediately clear that my components are not stored contiguously in memory, nor are they even sorted. Each different kind of entity can place different component structs inside it in any order, and even have duplicates. It's very dead simple. I'm not really worried about efficiency here. Underlying libraries that handle rendering and physics are written in very optimized ways. For example when sprites figure out what they want to finally render they push quads into an array stored inside of a rendering compositor. The rigid body is just wrapping the underlying physics system, and calling functions on the rigid body and relaying callbacks.

So my answer is forget DoD for components, forget heavy optimization for components, and let underlying systems worry about this stuff. Use components mostly as wrappers, and don't make very many types of them. Compartmentalize them, make them self contained when possible.

Edit: Similar to Kylotan's post, I would recommend to not worry about any of the t-machine stuff. I've read it in the past and long ago decided it wasn't very useful stuff.

Edited by Randy Gaul

Share this post


Link to post
Share on other sites

You wouldn't be doing "if entity has X component" in the most common operation on the components, though - you would be batch processing them sequentially. If you're not doing that, don't decompose the data this way.

You say this often, but the premise of this thread - and almost every thread on component-based systems, is that it's very impractical in any real-world software to make a component that you can process sequentially in isolation from other components. Hence why even the canonical data-oriented site does exactly these checks for component existence when talking about game code! Many of the desirable traits it talks about elsewhere on the site are ditched for their game code example, because they're simply not a good fit.

The short answer is that usable game logic code requires pragmatic communication between components, and therefore any dogmatic attempts to lay out components in contiguous memory with an eye to batch processing with no conditionals, branching, or cache hits to other parts of the memory, are flawed. (And I know you're not recommending this. Just pointing this out as a wider concept.)

Share this post


Link to post
Share on other sites

You wouldn't be doing "if entity has X component" in the most common operation on the components, though - you would be batch processing them sequentially. If you're not doing that, don't decompose the data this way.

You say this often, but the premise of this thread - and almost every thread on component-based systems, is that it's very impractical in any real-world software to make a component that you can process sequentially in isolation from other components.


That hasn't been my experience. Sure, a big legacy system like the ones I've worked with for the last few years have twisted, convoluted game logic where various entity types are constantly referring to one another - but that doesn't mean it has to be that way. Come to think of it, some of the refactoring I did just a couple of weeks ago was extracting something from the spaghetti that could be processed sequentially, but wasn't.

Also, processing in isolation from other components is a kind of decoupling, is it not? I mean, if you can separate a component out as an object, encapsulate it, and decouple it from the rest - a goal in OOP, too! - you should be able to do sequential processing on components of that type.
 

Hence why even the canonical data-oriented site


Given that data-orientation is fundamentally anti-dogma, I don't think "canonical" is a word that applies to anything "data-oriented." As Hodg said, if you drink the "DOD koolaid", everything is a case-by-case decision, so there I don't think there even is a canonical data-oriented architecture.
 

does exactly these checks for component existence when talking about game code!


The example has checks, yes - but it's still doing sequential processing of disjoint components and getting advantages from that. That code could be refactored further to remove the extra checks, as well, but the article makes the point that even going just this far makes the code more parallelizable and easier to reason about than the previous mishmash of methods that could be called from anywhere, at any time. Extracting state into sequentially-processed components is not JUST about cache misses, contrary to a weirdly popular belief I've seen posted before.
 

Many of the desirable traits it talks about elsewhere on the site are ditched for their game code example, because they're simply not a good fit.


I mean, that's very much in the spirit of DOD principles. :)

You're also treating a simple example of a refactoring process as being some shining "canonical form" of data-oriented design, when that simply doesn't exist. There's no "right way to do DOD" that is exemplified by some particular code example. DOD is not actually about code or even architecture. In DOD thinking, those things are just what you end up with when you solve data transformation problems. The mistake "ECS kool-aid drinkers" make is basically the same thing - asserting that an architecture solution is "canonical" irrespective of the actual problem at hand.
 

Now most articles suggests that systems should own the data/components and store them contiguosly when possible. Though 9 out of 10 the system described somehow need only one component type and or need to loop on each component to do work.

I find this to be a bit unrealistic, and most often then not system need multiple component types and only half of the requested component types are actually stored in the current system.


If you're going with an ECS-like architecture, I'd say it's fine to store heavily-intertwined components in the same system - even desirable. If there's a tight coupling between components that otherwise need to be processed separately, and the "outside world" doesn't need to know about both separately, then they should be tracked by the same system or even put in the same component.
 

And again it's not said that it is wise to try to process all the existing PositionComponents if only few of their NetworkComponent got updated... but this would either require a dreaded if/branch or an (expensive?) sort of the NetworkComponents, so that all the modified ones are near together.


You could also store a list of the entity indices of the modified NetworkComponents and iterate over that.
 

Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.


You can somewhat solve this with the "compacting sparse array" I described above. The only "holes" are in the index table, which should be fairly small. The actual components would be stored contiguously. Edited by Oberon_Command

Share this post


Link to post
Share on other sites
A couple of comments/questions though, if i understood it right, in the array you have components of different types one after the other, grouped by entityid?

Nope. It's one array/sparse array per component type. You'd access them like:

Position s = positions.get(entityId);

Velocity v = velocities.get(entityId); 

Where the actual implementation of that 'get' might be a direct array indexing or something a bit more complex as I mentioned.

I don't store components in systems, they're stored in a separate structure that has all the containers for components. Systems request particular component containers when they're initialized. Thus any system can request any component they need.

Moreover, systems get notified when an entity is added/removed/mutated (as in, adding/removing components from it), and perform checks on them to determine if they're useful to the system (ie, they have the right components present), if they do, they go into the active entity list of the system. So you end up avoiding the "if(positions.has(entityId))" checks some people mentioned, since it is guaranteed that the active entity list will match the "shape" the system expects it to have.

I've seen this multiple times:

"How do you share components between systems!?" -> You don't, they live somewhere else, and systems can request whatever component types they need. If two systems happen to use the same component type, so be it, that's how one system gets notified of the effects of another system, through the component's data.

"How do I avoid branching by component in the system loop!?" -> With an explicit filtering step that happens when it actually makes sense to check the shape of the entity. You have to code for that, and define stages where entities get added/removed/mutated, so they only get checked the single time it could matter and no more.

If you make the entity filtering fast enough (which it is using bitsets to mark present components in the entity and AND/OR operations to match against a system defined mask), it becomes a non-issue.

Edited by TheChubu

Share this post


Link to post
Share on other sites

Thanks Oberon and TheChubu, I kind of had the "solution" (at least for the issue I had) in my hands a couple of days ago already but I didn't realize it: I was already using a sparse array to dense array implementation to hold things, but I wasn't controlling where in the sparse array things were put.. and I didn't think to do that.

Oberon: just want to say that it's not always feasible (or make any sense) to put components in the same system, I mean, like my example before.. why should I store a NetworkComponent, which is read and written by a NetworkSystem, inside a MovementSystem vs store it in the NetworkSystem?
To solve that I have to store references to NetworkComponents in the MovementSystem. The only confusion I had was how to link the NetworkComponent with its PositionComponent (or vice versa).

Now there's a caveat in having fixed mapping entity id -> index in sparse array... it's that the sparse array has to be as big as the highest entity id value.
And moreover if you store components (and references to them) into systems, therefore duplicating them multiple times.. even if the entityid is just 24 bits lets, say, you still risk to use a lot of memory (depending obviously on how many systems there are, how many component types and how many of them are used by more than one system).

So I guess that keeping them in the system works only if you don't have many entities, but when you start having tens of thousands maybe it's better to have them in a central place (or have them there from the start).
Also if you do that sparse array->dense array for every component type, if you have a lot of entities using one type but not the others, you still have to allocate an array as big as the highest entity id value... and those sparse array will be pretty empty.
But i guess that in this case, if they are that empty, it means that not many of them has to be looped onto, so one can use a dense sorted array instead of a sparse array, and do a binary search... or use some other data structure.

Am i making any sense?

Share this post


Link to post
Share on other sites

Oberon: just want to say that it's not always feasible (or make any sense) to put components in the same system, I mean, like my example before.. why should I store a NetworkComponent, which is read and written by a NetworkSystem, inside a MovementSystem vs store it in the NetworkSystem?
To solve that I have to store references to NetworkComponents in the MovementSystem. The only confusion I had was how to link the NetworkComponent with its PositionComponent (or vice versa).


Oh, of course, in this particular case putting them in the same system doesn't make a whole lot of sense - I was just speaking to the general case.

I would say don't have the movement system OR the position system reference the network system. Another thing you could do with the sparse arrays is have the network state be separate from the entity state entirely - so NetworkComponents would track entity indices associated with them, but wouldn't be tied directly to the entity. There are some other benefits to having your network "replication" layer separate from the game state - notably, your game state no longer needs to worry about updating the network state, the network replication system can grab values to send from the game state and push data received to the game state. Effectively, the NetworkComponents would live on special, separate entities that never share components with the game state entities.
 

Now there's a caveat in having fixed mapping entity id -> index in sparse array... it's that the sparse array has to be as big as the highest entity id value.


This is a bit of a console mindset speaking, but - if this is an issue you could place a maximum limit on the number of entities and preallocate ahead of time.

Another idea is to do something similar to the network replication - if you know ahead of time that certain entities will always have the same "shape" and won't share components, then you can have separate lists of disjoint entities and keep those sizes nice and small. So for instance, you might have a group of entities that don't ever move and only exist to make sound effects and a group of entities that move, don't produce sound, and only ever render (eg. particle effects). Those entities will never share components, so you can put them into separate lists - in fact, you can have multiple entities with the same index as long as they aren't the same "type" of entity, and you have some way of determining that. This does put some design burden on the systems when some of their components are on non-disjoint entities and some of them are not - especially if that distinction can be made at run time! - but again, pre-sorting into separate buckets is your friend here. You could create a mechanism for components to in multiple "buckets", and assign buckets to different entity classes.

edit: look, verb tenses are hard when you woke up at 5:30am...
edit2: here are some articles for you to read that may prove useful, which touch on the ideas I've mentioned:
http://gamesfromwithin.com/managing-data-relationships
http://gamesfromwithin.com/data-oriented-design-now-and-in-the-future Edited by Oberon_Command

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this