I had some idea without being able to implement it yet. I hope this isn't completely useless, it's possible that I have just overlooked something simple and this just doesn't work. Anyways.. I think the following way of component storage is familiar to most people who have implemented an ECS system, just a short recap:
Have global arrays of the same fixed size for each component type where the entity ID is a literal index into the array. Creating a new entity anywhere just increases the array size for all component types by 1.
If your game is entirely coded in C++ and don't access or create component types from outside scripts, all necessary information is available statically and getting any component from any type is just a fixed array access. I.e.
auto position = g_components.positionArray[entityID];
auto mesh = g_components.meshArray[entityID];
This sounds pretty the first time you see ECS implemented like this because component access can't get much faster than this. Problems with this approach (and why people use others):
- entities that only have component 1, but not 2 3 4 and 5 still use memory for 2 3 4 and 5
- Systems that iterate through all components will have to skip some NULL/nullptr components that are unused
- Most engines do need some kind of data driven way to define and use components. With the added complication of tracking component type IDs, this also means component access becomes at least two array accesses, something like:
auto position = g_components[componentID][entityID]
So people do stuff like storing components in an associative manner instead, which is theoretically much slower in terms of how you access these components.
My proposal for a different solution: Create code that generates these global array structures for each used permutation of entities and their used component types. So there's not just a single global component collection, and instead multiple, each for one permutation of components used(not actual C++ syntax):
g_positionMeshComponents = {
vector<vec3> positions;
vector<mesh> meshes;
}
g_positionWeaponInventoryComponents = {
vector<vec3> positions;
vector<weapon> weapons;
vector<inventory> inventories;
}
etc. So when you create an entity that has a position and mesh component (and nothing else), you will know to access the data from that particular global variable.
Of course, in practice these global collections wouldn't be stored in static variables. You can't create these statically, because the number of possible permutations is too large if you have any realistic amount of component types. So you need to only keep those permutations around that are actually used. So instead you could create a key from the component types an entity uses (for example: bit mask where each component type is 1 bit in the mask, or a hash with low chance of collision...) and then access components like
auto componentCollection = g_components[key];
auto position = componentCollection[typeID][entityID];
or something like that. The key only needs to be updated when you add or remove components to / from an entity. This operation would be exceedingly rare even in complex games, at least in my experience. As such you can either store a pointer to the relevant component container in your entity struct, or work the key into the first 32 bits of the entity ID, or whatever...
Now you have a system where everything is stored nicely in a linear fashion, and only uses the memory it actually requires, plus the component access is really fast. It's still triple indirection, but it should come ahead easily of even highly optimized associate solutions.
But now there's one bigger problem: What about systems that would really just like to iterate through all components of a particular type and do stuff? The components are now spread over several arrays. My proposal is to write these systems separately, such that they have separate component storage and the actual component types used by entities are just wrappers used for access to these components, and they are kept in sync at some point in the code.
You need to do this anyways... when you use third party systems like physics engines and whatnot that have their own ideas about what is optimal for data storage and management...
Another problem: In this lookup
auto componentCollection = g_components[key];
auto position = componentCollection[typeID][entityID];
the componentCollection might only have two component types, but typeID can be pretty big (e.g. if every component type has just an incremented ID). Since you want to avoid associative lookup here, you would need to take the overhead and make each array in those componentCollections large enough to contain all component types... I'm not sure how bad this would be in terms of cache misses if a component lies in the middle of the array and the pointers to other type arrays right next to it are unused. Is there a better strategy?
What do you think about it?