How to resolve Entity - Component association in an ECS

Started by
17 comments, last by Oberon_Command 6 years, 11 months ago

I think the Artemis system has quite elegant ways of handling cross-component dependencies via placing an 'Aspect' abstraction between Systems and Components. But it doesn't remotely give you any of the data-oriented caching benefits. At that point, you have to ask what benefit it's giving you as compared to a simpler Unity or Unreal style component, i.e. as a simple encapsulated sub-object.

Thanks for reminding me to check it again, more in depth.

Most of the aversion to putting behaviour in components stems from several people incorrectly asserting that components are somehow 'against' object-orientation, and that therefore we need to ditch all object oriented features to truly embrace components. In fact components have been a key part of object-oriented design for decades. The Design Patterns book was saying "favor 'object composition' over 'class inheritance'" way back in 1995.

Well yes, zealotry is never good and i've almost always think about OOP code not in term of deep inheritance hierarchies, or basically all the "bad" characteristics people give to it.
At the same time i recognize that sometimes in the pursue of giving better interfaces, this complicates refactoring for concurrency and giving priority to data flow.


Being nice to cache doesn't always mean "make all the things contiguous". In the example implementation I mentioned, the entire hash table was two cache lines. Once they're fetched once, they likely remain in cache for the entire system update. The same can be true for larger hash tables too - sometimes you may be able to pre-fetch the entire hash map...

Here i was thinking of a loop that needs to access to components of the same type, one after the other. It obviously all depend on the pattern access and data sizes.

Extremely. They're opposite philosophies -- one says "look at the actual problems that need to be solved and craft minimal solutions to them" and the other says "lets make a generic framework that will solve every problem, and then solve every problem in the one way that makes it fit into our framework".

And this surprise me, i supposed that the fact that ECS talks about systems which have the logic and components are only data was to kind of enforce DOD.

As a general rule of thumb having contiguous data helps the cache. That's not true for every situation though. [...]

As before i was still thinking of the same type of loop.

Wait wait wait. Deep inheritance hierarchies are not normal. A core rule of OOP is that composition (i.e. the use of components) is preferred over inheritance.
If you've been reading the kinds of blogs that say "OOP is bad because deep inheritance trees have a bunch of problems, so here's ECS and it solves everything!"... then I'm sorry but these are straw-man arguments made by people who suck at OOP and instead of choosing to learn how to do OOP properly, they've instead given up and invented a new, completely unrecognised/informal pattern as a crutch for their ego issues. They're basically saying "OOP is bad because I'm bad at it, so I'm going to reinvent an ad-hoc version of the relational model and claim it as a new invention". Those blogs are bad.

Well i find that to be pretty wrong too, as i've said with Kylotan.
What i meant with normal (probably poor choice of word) is what commonly (and probably wrongly) people referred to inheritance based games which have to be despised.

If you drink the ECS cool aid, then it doesn't matter, just implement any solution and use it everywhere. If you drink the DOD cool aid, then everything is a case by case decision and you shouldn't bother with a component framework at all. If you stick with OOP, then everything is a component already, you have a multitude of choices for how they're connected to each other and you don't need a framework.

Not trying to follow any specific religion, if it seems so is because i still want to evaluate them fully to understand what seems the best and the worst parts, keeping the worst ones limited to few areas, where possible.
And you all are helping with that :).
Advertisement

The T-Machine style of "entity systems" (as he calls it) is not a very consistent design philosophy. In his first blog post on it, he says that components are for "speeding up the development time". Okay. But by post 5 it's off down the rabbit hole of how to store components in relational databases and only more recently did he jump on the data-oriented programming bandwagon and start worrying about how to lay things out in memory, ultimately admitting that none of these approaches are perfect and you probably don't need to worry about that.

From the other direction, some people have come at it entirely from the data-oriented side, but although the main body of this approach is all about homogeneous transformations on contiguous arrays of data, as soon as it mentions component-based entities for games you can see from the example there that they're back to "if this entity has X component, do Something(X)". They use sparse arrays and check for index existence instead of null pointers inside entity objects but the basic logic is the same, thus throwing away much of the other DoD advantages such as to avoid boolean conditionals or to perform little other than straight data transformations.

Since both of these viewpoints include a disdain for OOP, and both viewpoints prefer to think about the data first and the data owner second, it's easy to think that they're two aspects of the same thing, but really they are 2 separate tools that have a small overlap - and the area where they overlap is arguably too small to base your whole game project around it.

The fact is, to go right back to the original question, there are several efficient ways of locating components, each more or less suitable for a given program or subsystem, and each probably breaking some subjective 'rule' about how this 'should' be done:

  1. entity contains all components inline (possibly 'switched off' if necessary)
  2. entity holds pointer or reference to relevant components, which may be null
  3. entity holds index of component in some external vector or array
  4. system holds sparse array (e.g. std::unordered_map), mapping entity IDs to components
  5. system holds contiguous array of components, only ever accessing them sequentially
  6. system holds list of aspects, each of which maps an entity to a tuple of components (pointers or indices)

When it comes to pragmatic coding, in the general case of game logic code, I prefer option 2.

From the other direction, some people have come at it entirely from the data-oriented side, but although the main body of this approach is all about homogeneous transformations on contiguous arrays of data, as soon as it mentions component-based entities for games you can see from the example there that they're back to "if this entity has X component, do Something(X)". They use sparse arrays and check for index existence instead of null pointers inside entity objects but the basic logic is the same, thus throwing away much of the other DoD advantages such as to avoid boolean conditionals or to perform little other than straight data transformations.


You wouldn't be doing "if entity has X component" in the most common operation on the components, though - you would be batch processing them sequentially. If you're not doing that, don't decompose the data this way. The point of using the sparse arrays is the batch processing, so if you don't need or can't meaningfully rework the data into a batch-processable form, don't bother doing it. The index checks aren't really any worse than your option 2, anyway - it's still a double indirection, just in a different form.

I would also point out that my first choice for a sparse array would NOT be std::unordered_map, but a flat vector with an index-to-index map. Eg.

// NOT this
std::unordered_map<int, Component> components;
Component* TryAndGetComponent(int index)
{
   auto componentIter = components.find(index);
   if (componentIter != std::end(components))
   {
       return &*componentIter;
   }
 
   return nullptr;
}

// but this - componentIndices is the size of the number of entities, componentData is the number of active components
// as you add and remove components, you'd update componentIndices so the same entity index always points at the right component
// this is still a double-indirection, but it's guaranteed O(1) rather than worst-case O(n) as with std::unordered_map, and you don't need hashes at all
// plus, all the components are contiguous in memory, so you can iterate over them in a uniform fashion!
// of course, you'd want to wrap this up in a SparseArray class... and maybe have a type-safe handle class...
std::vector<int> componentIndices;
std::vector<Component> componentData;
Component* TryAndGetComponent(int index)
{
   int actualIndex = componentIndices[index];
   if (actualIndex != -1)
   {
        return &componentData[actualIndex];
   }

   return nullptr;
}
But again, the whole point of DOD is not to be dogmatic and lock yourself into a single architecture for everything. ECS and DOD can coexist, but only if the ECS is the end result of the DOD. Don't do this if you don't get anything out of it.

I like to try to think of components as big pieces that can be compartmentalized. For a 2D game something like: rigid body, sprite, logic, input, character controller, and probably some other game-specific ones. Here's what each might do:

  • Sprite : reference animations or images, perform actual animating logic, store any bones/keyframes, wraps whatever underlying code necessary (maybe some DirectX or PNG loading as an example)
  • Rigid body : wraps underlying physics library, holds rigid body, colliders, deals with collision callbacks and filtering
  • Logic : stores scripts, swaps scripts around as necessary, holds whatever gameplay or logic related things as necessary, like structs or attributes needed to define itself
  • Input : relays or filters messages on input keystrokes, mouse clicks, sets up listeners and whatnot
  • Character controller : Pretty self explanatory. Character controllers are important and big enough to warrant their own component

It really is a Bad IdeaTM to try and break components down into smaller pieces. Breaking things down all the way to "velocity" and "position" components is way too far. There is not real benefit from such granularity, and in the end this forces the developer to highly optimize the component storage/association... Which is a waste of time for many developers. Really, the powerful thing about components is that they can be separated into different isolated services, each self-contained. This is really similar to the old "actor-model" concept, where actors contain private data, and can modify their own private data, and interact with other actors by sending messages. A message can be safely ignored by any actor at any time.

What more modern ECS try to do is to setup arrays of data-only components, and then loop over them to do processing with "systems". A system merely comprising (at least conceptually) of a for loop running over an array of data. Thus the messaging becomes indirect communication between systems as defined by the operating order on the components. System A comes before system B, and modifies the components. B then reads the components and sees the modifications, and so on to system C. Communication, or messaging, is thrusted into the system processing order. Personally I find this is super overkill for anything short of a large commercial project with multiple engineers.

Try to focus on defining large components that can be self-contained, and wrap underlying pre-existing libraries. Try to focus on how to setup a nice messaging system, something very straightforward to use and debug. Function calls are perfectly fine for implementing messaging. The idea here is to merely to decouple dependencies through an abstraction! A message should be used as a dependency decoupling mechanism. It will definitely take some practice and skill to figure out what should and should not be a message.

At this point it's suddenly not all that important how the components are tied together to form an entity. In my own personal code I do this:


struct Entity
{
	EntityID id;
	Entity* next;
	Entity* prev;

	virtual void Update( Entity* entity, float dt ) = 0;
	virtual void Draw( Entity* entity ) = 0;
};

struct Monster : public Entity
{
	RigidBody body;
	Sprite some_sprite;
	Sprite another_sprite;
	Logic logic;

	void Update( Entity* entity, float dt ) override;
	void Draw( Entity* entity ) override;
};

It is immediately clear that my components are not stored contiguously in memory, nor are they even sorted. Each different kind of entity can place different component structs inside it in any order, and even have duplicates. It's very dead simple. I'm not really worried about efficiency here. Underlying libraries that handle rendering and physics are written in very optimized ways. For example when sprites figure out what they want to finally render they push quads into an array stored inside of a rendering compositor. The rigid body is just wrapping the underlying physics system, and calling functions on the rigid body and relaying callbacks.

So my answer is forget DoD for components, forget heavy optimization for components, and let underlying systems worry about this stuff. Use components mostly as wrappers, and don't make very many types of them. Compartmentalize them, make them self contained when possible.

Edit: Similar to Kylotan's post, I would recommend to not worry about any of the t-machine stuff. I've read it in the past and long ago decided it wasn't very useful stuff.

You wouldn't be doing "if entity has X component" in the most common operation on the components, though - you would be batch processing them sequentially. If you're not doing that, don't decompose the data this way.

You say this often, but the premise of this thread - and almost every thread on component-based systems, is that it's very impractical in any real-world software to make a component that you can process sequentially in isolation from other components. Hence why even the canonical data-oriented site does exactly these checks for component existence when talking about game code! Many of the desirable traits it talks about elsewhere on the site are ditched for their game code example, because they're simply not a good fit.

The short answer is that usable game logic code requires pragmatic communication between components, and therefore any dogmatic attempts to lay out components in contiguous memory with an eye to batch processing with no conditionals, branching, or cache hits to other parts of the memory, are flawed. (And I know you're not recommending this. Just pointing this out as a wider concept.)

You wouldn't be doing "if entity has X component" in the most common operation on the components, though - you would be batch processing them sequentially. If you're not doing that, don't decompose the data this way.

You say this often, but the premise of this thread - and almost every thread on component-based systems, is that it's very impractical in any real-world software to make a component that you can process sequentially in isolation from other components.


That hasn't been my experience. Sure, a big legacy system like the ones I've worked with for the last few years have twisted, convoluted game logic where various entity types are constantly referring to one another - but that doesn't mean it has to be that way. Come to think of it, some of the refactoring I did just a couple of weeks ago was extracting something from the spaghetti that could be processed sequentially, but wasn't.

Also, processing in isolation from other components is a kind of decoupling, is it not? I mean, if you can separate a component out as an object, encapsulate it, and decouple it from the rest - a goal in OOP, too! - you should be able to do sequential processing on components of that type.

Hence why even the canonical data-oriented site


Given that data-orientation is fundamentally anti-dogma, I don't think "canonical" is a word that applies to anything "data-oriented." As Hodg said, if you drink the "DOD koolaid", everything is a case-by-case decision, so there I don't think there even is a canonical data-oriented architecture.

does exactly these checks for component existence when talking about game code!


The example has checks, yes - but it's still doing sequential processing of disjoint components and getting advantages from that. That code could be refactored further to remove the extra checks, as well, but the article makes the point that even going just this far makes the code more parallelizable and easier to reason about than the previous mishmash of methods that could be called from anywhere, at any time. Extracting state into sequentially-processed components is not JUST about cache misses, contrary to a weirdly popular belief I've seen posted before.

Many of the desirable traits it talks about elsewhere on the site are ditched for their game code example, because they're simply not a good fit.


I mean, that's very much in the spirit of DOD principles. :)

You're also treating a simple example of a refactoring process as being some shining "canonical form" of data-oriented design, when that simply doesn't exist. There's no "right way to do DOD" that is exemplified by some particular code example. DOD is not actually about code or even architecture. In DOD thinking, those things are just what you end up with when you solve data transformation problems. The mistake "ECS kool-aid drinkers" make is basically the same thing - asserting that an architecture solution is "canonical" irrespective of the actual problem at hand.

Now most articles suggests that systems should own the data/components and store them contiguosly when possible. Though 9 out of 10 the system described somehow need only one component type and or need to loop on each component to do work.

I find this to be a bit unrealistic, and most often then not system need multiple component types and only half of the requested component types are actually stored in the current system.


If you're going with an ECS-like architecture, I'd say it's fine to store heavily-intertwined components in the same system - even desirable. If there's a tight coupling between components that otherwise need to be processed separately, and the "outside world" doesn't need to know about both separately, then they should be tracked by the same system or even put in the same component.

And again it's not said that it is wise to try to process all the existing PositionComponents if only few of their NetworkComponent got updated... but this would either require a dreaded if/branch or an (expensive?) sort of the NetworkComponents, so that all the modified ones are near together.


You could also store a list of the entity indices of the modified NetworkComponents and iterate over that.

Another way i saw is to have a 1 to 1 mapping between indices of the components among all the systems, so that the index = entityid. But this yields to poor memory usage/holes due to entities obviously not being made always of all the possible component types.


You can somewhat solve this with the "compacting sparse array" I described above. The only "holes" are in the index table, which should be fairly small. The actual components would be stored contiguously.
A couple of comments/questions though, if i understood it right, in the array you have components of different types one after the other, grouped by entityid?

Nope. It's one array/sparse array per component type. You'd access them like:

Position s = positions.get(entityId);

Velocity v = velocities.get(entityId);

Where the actual implementation of that 'get' might be a direct array indexing or something a bit more complex as I mentioned.

I don't store components in systems, they're stored in a separate structure that has all the containers for components. Systems request particular component containers when they're initialized. Thus any system can request any component they need.

Moreover, systems get notified when an entity is added/removed/mutated (as in, adding/removing components from it), and perform checks on them to determine if they're useful to the system (ie, they have the right components present), if they do, they go into the active entity list of the system. So you end up avoiding the "if(positions.has(entityId))" checks some people mentioned, since it is guaranteed that the active entity list will match the "shape" the system expects it to have.

I've seen this multiple times:

"How do you share components between systems!?" -> You don't, they live somewhere else, and systems can request whatever component types they need. If two systems happen to use the same component type, so be it, that's how one system gets notified of the effects of another system, through the component's data.

"How do I avoid branching by component in the system loop!?" -> With an explicit filtering step that happens when it actually makes sense to check the shape of the entity. You have to code for that, and define stages where entities get added/removed/mutated, so they only get checked the single time it could matter and no more.

If you make the entity filtering fast enough (which it is using bitsets to mark present components in the entity and AND/OR operations to match against a system defined mask), it becomes a non-issue.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Thanks Oberon and TheChubu, I kind of had the "solution" (at least for the issue I had) in my hands a couple of days ago already but I didn't realize it: I was already using a sparse array to dense array implementation to hold things, but I wasn't controlling where in the sparse array things were put.. and I didn't think to do that.

Oberon: just want to say that it's not always feasible (or make any sense) to put components in the same system, I mean, like my example before.. why should I store a NetworkComponent, which is read and written by a NetworkSystem, inside a MovementSystem vs store it in the NetworkSystem?
To solve that I have to store references to NetworkComponents in the MovementSystem. The only confusion I had was how to link the NetworkComponent with its PositionComponent (or vice versa).

Now there's a caveat in having fixed mapping entity id -> index in sparse array... it's that the sparse array has to be as big as the highest entity id value.
And moreover if you store components (and references to them) into systems, therefore duplicating them multiple times.. even if the entityid is just 24 bits lets, say, you still risk to use a lot of memory (depending obviously on how many systems there are, how many component types and how many of them are used by more than one system).

So I guess that keeping them in the system works only if you don't have many entities, but when you start having tens of thousands maybe it's better to have them in a central place (or have them there from the start).
Also if you do that sparse array->dense array for every component type, if you have a lot of entities using one type but not the others, you still have to allocate an array as big as the highest entity id value... and those sparse array will be pretty empty.
But i guess that in this case, if they are that empty, it means that not many of them has to be looped onto, so one can use a dense sorted array instead of a sparse array, and do a binary search... or use some other data structure.

Am i making any sense?

Oberon: just want to say that it's not always feasible (or make any sense) to put components in the same system, I mean, like my example before.. why should I store a NetworkComponent, which is read and written by a NetworkSystem, inside a MovementSystem vs store it in the NetworkSystem?
To solve that I have to store references to NetworkComponents in the MovementSystem. The only confusion I had was how to link the NetworkComponent with its PositionComponent (or vice versa).


Oh, of course, in this particular case putting them in the same system doesn't make a whole lot of sense - I was just speaking to the general case.

I would say don't have the movement system OR the position system reference the network system. Another thing you could do with the sparse arrays is have the network state be separate from the entity state entirely - so NetworkComponents would track entity indices associated with them, but wouldn't be tied directly to the entity. There are some other benefits to having your network "replication" layer separate from the game state - notably, your game state no longer needs to worry about updating the network state, the network replication system can grab values to send from the game state and push data received to the game state. Effectively, the NetworkComponents would live on special, separate entities that never share components with the game state entities.

Now there's a caveat in having fixed mapping entity id -> index in sparse array... it's that the sparse array has to be as big as the highest entity id value.


This is a bit of a console mindset speaking, but - if this is an issue you could place a maximum limit on the number of entities and preallocate ahead of time.

Another idea is to do something similar to the network replication - if you know ahead of time that certain entities will always have the same "shape" and won't share components, then you can have separate lists of disjoint entities and keep those sizes nice and small. So for instance, you might have a group of entities that don't ever move and only exist to make sound effects and a group of entities that move, don't produce sound, and only ever render (eg. particle effects). Those entities will never share components, so you can put them into separate lists - in fact, you can have multiple entities with the same index as long as they aren't the same "type" of entity, and you have some way of determining that. This does put some design burden on the systems when some of their components are on non-disjoint entities and some of them are not - especially if that distinction can be made at run time! - but again, pre-sorting into separate buckets is your friend here. You could create a mechanism for components to in multiple "buckets", and assign buckets to different entity classes.

edit: look, verb tenses are hard when you woke up at 5:30am...
edit2: here are some articles for you to read that may prove useful, which touch on the ideas I've mentioned:
http://gamesfromwithin.com/managing-data-relationships
http://gamesfromwithin.com/data-oriented-design-now-and-in-the-future

This topic is closed to new replies.

Advertisement