Dsigning ECS framework for my egnine

Started by
2 comments, last by All8Up 4 years, 7 months ago

alright,

So I have been doing research about Data-Oriented design and ECS and how contiguous memory is important and how stalling the CPU because of a cache miss is really bad when you have a lot of objects to update or render

 

So I have this UML diagram that I created

lolgg.thumb.PNG.6a2ed8b14d14f97732663e43456b829d.PNG

 

it is simple enough, I have a position component that only deals with positions, a transform system that iterate through a vector and update the positions. All that data is contiguous. The total size of the position component 20 bytes, assuming we have a 64 bytes cache line that means we can fit 4 of them in a single cache line before we have a cache miss. Now e also have a sprite component that has color of the sprite, a pointer to the texture and a pointer to the gameObject where it came from. The sprite component is also contiguous in memory and each sprite is 28 bytes, which means we can fit two sprites on a single cache line before we have a cache miss. Finally, our GameObject which hold a position component to know where to render the sprite on the screen and a sprite component to know the sprite texture and color.

 

Now the sprite renderer will need to access the position component from the game object to know where to render the sprite on the screen. However, if we assume we have 1 million sprites that means have 1 million game objects and 1 million position components. Now every time the sprite renderer iterate through the sprite it has to make a memory jump to the position component to get the position. since the position component is not contiguous in memory with the sprite it means it has to go to main memory to read that data which would be a big issue which means we have a cache miss.

 

My question is, how do I reduce the number of cache misses I have here?

 

How can I improve this ECS system?

 

Advertisement

You don't need a GameObject. An Entity exists iff any component with its Entity-ID exists.

Now you need to associate an Entity-ID with each Component and there is a few possibilities:

- just use the index of the component-vector -> imo not good cause of needing useless holes for syncronizing all component arrays

- add the EntityID to all components -> bloats components and may need searches on component-vectors, but if you keep the component-vector sorted by Entity-ID, you can iterate more than one at once in a cachefriendly manner or use binary search for single components

- add an index-datastructure for each component-vector -> more complex, but maybe faster depending on what you use (vector, hashtable, ...)

 

As wintertime points out, the GameObject is not needed, but I think that there is a missunderstanding involved here about what/why/where the issues are.  Break it down conceptually to the most primitive level of what an Ecs represents as follows:


// Assume you have one entry in each of these items for each entity and they are both in order.
std::vector<Position> positions;
std::vector<Sprites> sprites;

for (size_t i=0; i<positions.size(); ++i)
{
  auto& p = positions[i];
  auto& s = sprites[i];
  ... do stuff ...
}

This is basically what you are trying to accomplish with the Ecs.  Now, I think from reading your description you may be having a misconception that the above has a cache problem when in fact this is a near optimal access pattern.  Cpu's are multi-associative when it comes to caching, meaning that they can track multiple streams of memory simultaneously.  Or, in other words, once this loop has been primed there will be no cache misses caused by p & s being in separate areas of memory, the Cpu will just maintain two cache lines simultaneously.  Additionally, the compilers may be smart enough to unroll the loop and do some optimizations based on this easily predicted memory access pattern, if not you can do it by hand of course.  So, this is exactly what you are attempting to accomplish with the Ecs and it is a very good thing.

Now, the downside of Ecs is actually maintaining things such that the above is what you are actually accomplishing.  This is where Ecs gets complicated and also where I believe a lot of confusion and dislike of the pattern comes from because so many implementations do it poorly.  Just for example, take 4 entities with a p&s component, what you want in memory is the following:


e  0, 1, 2, 3
p  0, 1, 2, 3
s  0, 1, 2, 3

Assume that you remove the 's' component from entity 1 and you end up with:


e  0, 1, 2, 3
p  0, 1, 2, 3
s  0, 3, 2

You have just broken the iteration pattern if you are using sparse arrays per component since now when you iterate 0..3 based on the entity index you are no longer linearly indexing into the s array.  Overtime, this gets so disorganized that you are not getting any real benefits over a normal OO pattern (actually it generally gets slower) and generally speaking Ecs becomes a waste unless the only thing you care about is minimizing memory overhead.

To solve such things there are several approaches, one is to do lazy sorts where over time you reorganize the 's' array to be linear.  This is relatively simple to implement and honestly not the most horrible thing in the world but if you have a lot of add/remove going on, you may never get the sorting to stabilize.  The second and more common high performance solution is archetypes/groups.  This can be rather complicated and has it's own set of performance gaffs though.  Generally speaking, a really simple outline of archetypes is that you end up with the following after removing the component:


Group p+s
e  0, 3, 2
p  0, 3, 2
s  0, 3, 2

Group p
e  1
p  1

How this works is that you don't use the 'e' as an index, that's just a reverse map to the original entity id and other than internal management it is pretty much ignored.  More importantly you have these two different groups (separate arrays which means we had to copy e+p into the new 'p' group) now rather than disjoint arrays.  So, if you need to iterate p+s, you only iterate that group and ignore the 'p' group.  If you need to iterate all things with 'p', you end up performing the iteration twice, once over p and once over p+s.  This can be difficult to maintain and complicates things but once again you are back to the original performance intentions of Ecs in terms of maintaining linear data access at all times.

Anyway, hope this is helpful, Ecs has a great number of advantages and an equally great number of disadvantages and misunderstandings.

This topic is closed to new replies.

Advertisement