Just a couple of Data-Oriented Design questions.

Started by
11 comments, last by SeanMiddleditch 9 years, 10 months ago

So I'm just barely getting into this, and I think it's a really neat way to think about things. I'm going to program a simple Asteroids-like game (more of a Lunatic Fringe clone) just to get my feet wet, and I'm pretty excited about it. I just have a couple of things that I need to have clarified about doing a component-system type arrangement with Data-Oriented Design.

My first question concerns how to organize the arrays of components. My first instinct is that there should just be one array of each type of component. For instance, there would only be one master array of transforms, and any game entity that owns a transform would just add their transform to this array. That way, any systems that need to affect the transforms can just iterate linearly through the whole array. Several drawbacks (none of them insurmountable) became immediately apparent:

  1. Entity deletion becomes more difficult. Not every entity needs every component, so the component arrays will all be of different sizes, and there is going to need to be some piece of code (in the Entity itself, perhaps) that keeps track of which components in which arrays belong to which entity. If it's keeping track via array indices, then these indices will have to be updated every time components are deleted from the arrays (assuming that components are always added to the end, and that deletion involves an actual erase-remove operation rather than just marking certain array elements as 'dead' and recycling them later).
  2. Not every system that works on transforms (say) needs to work on every transform. There may be certain systems that only need to perform operations on Asteroid transforms. This means that each component will have to identify the type of the entity to which it belongs, so that the system in question can check the type before operating on the component. Maybe this isn't such a bad thing. Conditional branches aren't incredibly expensive, and although this does mean you've potentially wasted a prefetch on an object that you don't need, doing nothing to an object is still faster than doing something.

So my next thought is that there should be a different component array for each Entity type that uses the component. In my surfing, I've come across several examples of code that looks like this:


struct Asteroids
{
    std::vector<Transform> transforms;
    std::vector<CollisionSphere> collisionSpheres;
    // etc...
};

So, the entity type here is Asteroid, and one instance of the Asteroids class holds a list of all asteroids in the scene. If you have a system that performs operations on asteroid transforms, but perhaps not the transforms of some other entity types, then you can just feed the system asteroids.transforms, and the system can do it's usual linear thing. There are a couple of drawbacks, it seems, to this approach as well:

  1. This breaks up a single big array (per component) into N smaller arrays per component (where N is the number of entity types that use that component), and some of those arrays (like players.transforms) might be very small. Running a system on a series of tiny arrays is probably not much better than jumping around in memory. However, there would still be significant savings, I would imagine, if your world contains several entity types that exist in large numbers (and thus have large component arrays).
  2. It seems that, suddenly, there are going to be a lot of areas of code that are going to need to know about the concrete entity types, which would not had they been programmed in a more traditional OOP style. In traditional OOP, you often just have an array of EntityBase* and you just call a polymorphic Update method on each entity. The code that iterates through this array doesn't know or care what the concrete types are. But suppose we're doing a data-oriented entity/system/component architecture, and suppose we have a system that performs some operation on transforms, but we only want Asteroids and BlackHoles to use it. There is going to need to be some code that calls system.update(asteroids.transforms) and then calls system.update(blackholes.transforms). My intuition (and maybe this is a good intuition, or maybe it's brought on by an overdose of OOP-thinking) is that main, central pieces of glue code should not know about concrete types; the only code that should care about the behavior of concrete types are the concrete types themselves, and the central "glue" code that brings all of these types together should know only about the abstract classes, and be completely decoupled from the concrete types themselves.

Despite these potential drawbacks, I still feel that option #2 (having entity classes like Asteroids that have their own arrays of components, rather than using a master array for each component that all entity types must share) is far preferable. However, I am interested in some other opinions, and to see if perhaps someone has thought of a third way that is even better.

My second question (or perhaps more of an observation) is that it seems like you can't entirely avoid random-access of objects. Take collision detection and response. Suppose you have some CollisionDetectionSystem whose job it is to iterate through all of the entities and find out which ones are colliding. For each collision, it adds a Collision object to an array of collisions. Then comes the collision resolution step. Even if you have just one CollisionResolutionSystem, that system is going to have to read each Collision object, find out which entities were involved in the collision, and then find the appropriate components for these entities in some other list. This find operation is going to necessarily entail jumping hither and yon through the component arrays, modifying them as called-for by the collision response.

Is there a clever solution to that sort of situation, or is it just a performance weakness that is accepted on the basis that it's still faster than the traditional OOP method of always jumping all over memory for everything?

Advertisement


If it's keeping track via array indices, then these indices will have to be updated every time components are deleted from the arrays (assuming that components are always added to the end, and that deletion involves an actual erase-remove operation rather than just marking certain array elements as 'dead' and recycling them later).

Another option is to use a double indirection for your component lookup (see this presentation). Then deleting a component just involves swapping two components and updating two entries in a roster index array.


There may be certain systems that only need to perform operations on Asteroid transforms. This means that each component will have to identify the type of the entity to which it belongs,

If you're writing code that needs to know the "entity type", you're already heading in the wrong direction. An entity should be completely defined by its components/data. Why would code need to treat the Transform on an Asteroid differently than the Transform on some other object? Can you give an example?

assuming that components are always added to the end, and that deletion involves an actual erase-remove operation rather than just marking certain array elements as 'dead' and recycling them later

I wouldn't make that assumption. I'd probably default to using a pool to store the objects - a fixed-size array of elements, where the unused ones are in a free list. Erasing an object involves destructing it and then adding it's pointer/index to the free list. Allocating an object involves popping the front of the freelist and constructing the object there.

Not every system that works on transforms (say) needs to work on every transform. There may be certain systems that only need to perform operations on Asteroid transforms. This means that each component will have to identify the type of the entity to which it belongs, so that the system in question can check the type before operating on the component.

I see that as a huge compromise / "code smell", like a dynamic_cast in an OOP system... sad.png
You can avoid that pollution by reorganizing your algorithm to first select only the components that it's interested in:
1) For each asteroid as A, add A.transform to list of transform handles.
2) For each transform in this list, do work...

Or you can have more than one transform pool.

But suppose we're doing a data-oriented entity/system/component architecture, and suppose we have a system that performs some operation on transforms, but we only want Asteroids and BlackHoles to use it.

As above, don't do this by iterating through all the transforms... Iterate through the asteroids or black holes! Either
* for each asteroid, fetch asteroid->transform, do stuff
or, as above
*1. for each asteroid, add asteroid->transform to list of handles
*2. for each item in list, fetch it, do stuff

My intuition (and maybe this is a good intuition, or maybe it's brought on by an overdose of OOP-thinking) is that main, central pieces of glue code should not know about concrete types; the only code that should care about the behavior of concrete types are the concrete types themselves, and the central "glue" code that brings all of these types together should know only about the abstract classes, and be completely decoupled from the concrete types themselves.

I would say the opposite (In all of OOP/CES/DOD). All of the components should be fairly isolated and self contained, knowing as little about the outside world as possible.

The high level game code (which glues all the components together) is the part that deals with many, many different concrete types, plugging them together to form useful gameplay systems.

This find operation is going to necessarily entail jumping hither and yon through the component arrays, modifying them as called-for by the collision response.

After generating the list of collision events, you can sort the list by collider-index, so that then when processing the collisions, you're iterating through the colliders in index-order.

In traditional OOP, you often just have an array of EntityBase* and you just call a polymorphic Update method on each entity.

Just for the record, I would call this "Bad OOP", not "traditional" tongue.png Although, traditionally a lot of OOP code is bad laugh.png biggrin.png

Game companies still use OOP day-in/day-out, but a lot better than they used to (with things like DoD and component-composition in mind, rather than using the inheritance-will-solve-all-our-problems methodology). Also, a lot of the proper ways to use OOP were known in the 90's, but not well taught, so not popularly used throughout the 90's... The whole modern "component system" idea is actually a core OOP idea that's been ignored for a long time in many circles.

If you're writing code that needs to know the "entity type", you're already heading in the wrong direction.

I see that as a huge compromise / "code smell", like a dynamic_cast in an OOP system...

Agreed 100%. I don't like that option at all. I definitely see that as a drawback of the single master-array method.

An entity should be completely defined by its components/data. Why would code need to treat the Transform on an Asteroid differently than the Transform on some other object? Can you give an example?

The transform for each entity should itself behave in exactly the same way (i.e., it should define the position, rotation, and scale of the game entity for physics and rendering purposes). However, the operations performed on the transform might be different from one entity to the next. For instance, a player-controlled spaceship would have a transform that is ultimately influenced by button presses, an AI-controlled spaceship would have a transform that is ultimately altered by some AI logic, and an Asteroid entity would have a transform that is altered by physics (some simple drift physics using Euler Integration or whatever).

Now, I know that some people would say that these should all be separate components -- TransformComponent, ShipAIComponent, PlayerControlComponent, DriftPhysicsComponent, etc. -- and that an Asteroid behaves the way it does because it owns a TransformComponent and a DriftPhysicsComponent. When the Asteroid updates, it calls Update on each of its components. When the DriftPhysicsComponent needs to update the transform, it needs to ask the Entity for the TransformComponent and use that interface to alter the transform, like so (naive example):


void DriftPhysicsComponent::Update(double dt)
{
    vec3 dv = a*dt;
    v += dv;

    vec3 dx = v*dt;

    TransformComponent& xform = dynamic_cast<TransformComponent&>(entity.GetComponent("Transform"));
    const vec3& pos = xform.GetPos();
    xform.SetPos(pos+dx);
}

However, ignoring for the moment the problem of the dynamic cast, my understanding is that this is not done in a Data-Oriented way. The first problem is that the Entities own the components. So, the components themselves might be spread hither and yon throughout the heap, and updating them necessarily entails jumping all over memory. This could be solved somewhat by storing the components in a pool or array somewhere else, and giving the entities a reference to the components rather than having them own the components themselves. However, since you are iterating over the Entities and then allowing the Entities to update their own components, you still have a problem with the update order, where the components (despite being stored in a contiguous array) are being updated according to when they are encountered in the update loop:

  1. Update Entity 0 (Asteroid)
    1. Update TransformComponent
    2. Update DriftPhysicsComponent
  2. Update Entity 1 (Asteroid)
    1. Update TransformComponent
    2. Update DriftPhysicsComponent
  3. Update Entity 2 (Black Hole)
    1. Update TransformComponent
    2. Update DriftPhysicsComponent
  4. Update Entity 3 (Enemy Ship)
    1. Update TransformComponent
    2. Update ShipAIComponent

Ok, so let's say we solve that problem by keeping the entities out of the loop (literally) -- don't iterate over entities. Iterate over the homogeneous arrays of components, and call Update on them directly. Now, you're actually updating the components in a sequential, cache-friendly way (to my understanding). However, there is still one problem -- there is a lot of cross-talk between components, as you can see from my naive dynamic cast implementation. Updating a DriftPhysicsComponent necessarily entails updating a TransformComponent, which may entail jumping around the TransformComponentArray a bit to find the TransformComponent that corresponds with the current DriftPhysicsComponent (although I'm sure that the sorting of these arrays can be optimized to minimize this problem).

So, I was reading this post, which argues that there should be a separation from the state of a component and the behavior of the component. The author uses the term Component to refer to a PODS-like struct with no behavior (so, a TransformComponent would be little more than a matrix, maybe with some GetPos/SetPos-like convenience methods) and he uses the term System to refer to an algorithm that acts on the component. So, a DriftPhysicsSystem, for example, would apply Euler integration to a TransformComponent. In this scheme, you'd feed the DriftPhysicsSystem an array of TransformComponents and it would loop through all of them, like so:


void DriftPhysicsSystem::Update(double dT, TransformComponentArray& xforms, MotionComponentArray& motions)
{
    for(size_t i = 0, iCount = xforms.size(); i < iCount; ++i)
    {
        vec3 dv = motions[i].a*dT;
        motions[i].vel += dv;

        vec3 dx = motions[i].vel*dT;
        xforms[i].pos += dx;
    }
}

The only catch now is that you need to have these contiguous arrays of the same size, where the ith element of xforms belongs to the same entity as the ith element of motions. Also, it can only contain components belonging to entities for whom DriftPhysics applies, e.g. BlackHoles and Asteroids.

So what I was sort of leaning toward was the second solution, where entities are defined like this:


struct Asteroids
{
   std::vector<TransformComponent> xforms;
   std::vector<MotionComponentArray> motions;
   // etc...
};

Asteroids asteroids;

And then, updating all of the asteroids would look something like this:


driftPhysics.Update(dT, asteroids.xforms, asteroids.motions);

Anyway, that is sort of a long-winded explanation, but I realized that perhaps I have a different (i.e., wrong) idea of how Data-Oriented component-entity architecture is supposed to work, and so I figured I ought to explain myself before there is any more confusion. Thank you very much for your patience, if you've read this far. Hopefully this gives a good idea of what I meant by certain behavioral "systems" applying to certain Transforms and not others.

As above, don't do this by iterating through all the transforms... Iterate through the asteroids or black holes! Either
* for each asteroid, fetch asteroid->transform, do stuff
or, as above
*1. for each asteroid, add asteroid->transform to list of handles
*2. for each item in list, fetch it, do stuff

I think I understand what you're saying, but I am struggling to understand (not your fault) how this confers the cache-friendly benefits promised by data-oriented design. In the first option you listed, where are these transforms stored? And how do I ensure that the transforms are going to be accessed in a cache-friendly way if I'm iterating over Asteroids instead of over Transforms? In your second option, I can see how this would create a nice, contiguous, homogeneous array of transform handles. However, if they are merely handles, and not the transforms themselves, then doesn't that just add a level of indirection to the actual transforms, which themselves might be stored all over the place in memory? Even if they're stored in some contiguous array elsewhere, how can I be sure that the handles will be in the right order? Sort them? Or ensure, in the first place, that the ordering of the TransformComponent array matches the ordering of the Entity array?

I would say the opposite (In all of OOP/CES/DOD). All of the components should be fairly isolated and self contained, knowing as little about the outside world as possible.

Absolutely, yes, but the way I usually do it is this: I have some high-level factory that knows how to piece together the various entities. It is game-specific and has intimate knowledge of what makes up an Asteroid or a BlackHole or whatever. However, this factory will return the objects as an EntityBase*, and then there will be some lower-level framework code that has to update and draw these entities, so it will iterate over them and call entity->Draw or entity->Update or whatever. This is what I was referring to by "glue" code.

But I guess I could keep that sort of system, even if I take the above approach with my Asteroids struct. I could just make Asteroids a class, and make the arrays private. Have Asteroids inherit from EntityBase, from which it gets a virtual Update method. From inside Asteroids::Update, have it call driftPhysics.Update(). In fact, at that point, I don't see the point in making DriftPhysics a class. It seems that a non-member function would probably work there, and it could be shared by BlackHoles.

Anyway, whether that's a wise design choice is a separate question I guess! tongue.png Any input you could provide would be awesome.

Ah, I understand where you're coming from now. It seems you have a good understanding of some of the issues with entity component systems.

In my case I've chosen the route of not being perfectly cache-friendly when I have systems that need to iterate through multiple sets of components. I doubt it will matter with the size of a game I would end up making on my own.

Regarding your proposed solution of having the physics/transform components for objects of a certain type stored together... with the way you've described it so far, I have trouble immediately seeing how it would fit into everything cleanly. I think it has merit though. I guess you have some higher level code that knows about all the objects with "grouped" components. I think I've come across someone's description of their entity component system that had a more formalized/generic description of something similar to this (unfortunately I can't remember where).

Other strategies I've heard discussed to address this are:

1) Duplicating transform data into other components (e.g. into the DriftPhysicsComponent). This needs to be kept in sync with the actual Transform components of course (thus potentially incuring some random access of one of the component arrays), but if you tracked dirty state and only sync'd when necessary this could be a win (obviously drifting asteroids move every frame, so in this case it wouldn't be a win).

2) Making some attempt to keep component arrays sorted according to access patterns (so iterating through DriftPhysicsComponents in order would result a more predictable in order access for the matching TransformComponents for those entities).

However, ignoring for the moment the problem of the dynamic cast, my understanding is that this is not done in a Data-Oriented way


Who cares? Use tools when you need them and don't try to slavishly apply them to every situation.

Even if you do want to write every last bit of your code using data-oriented design... that term doesn't say that you have to avoid every single cache miss ever no matter what. It just says that you should think about your data and how it's accessed when designing your architecture. If you've done that and haven't identified a verifiable issue that needs addressing, have a beer and move on to a real problem that needs your time and attention. smile.png

Sean Middleditch – Game Systems Engineer – Join my team!


Regarding your proposed solution of having the physics/transform components for objects of a certain type stored together... with the way you've described it so far, I have trouble immediately seeing how it would fit into everything cleanly

Yeah, one thing I don't really like about it is that I have these classes called Asteroids, BlackHoles, etc. I would prefer to just have an Entity class. No EntityBase, no hierarchy. All game objects are Entities, and how the Entity behaves, renders, etc., is fully defined by which Components it takes on. I think it wouldn't be too difficult, though, to generalize this idea a bit so that it has the best of both worlds. Maybe I could define something called an EntityType. It would have a list of ComponentBehaviors (like DriftPhysics) that would each update in a certain order. As above, an EntityType instance would contain arrays of data, where the ith element from each array corresponds to the ith Entity of that type. I would need a factory that could read in an EntityType specification, instantiate the arrays, instantiate the behaviors, and then bind the arrays to the behaviors. At that point, adding new Entity instances is simply a matter of pushing new data onto the arrays. For entity deletion, I could do something similar to the swap-delete thing in the Insomniac presentation you linked to (although I suppose this version of it would not be much different than erase-remove).

I should probably read that Insomniac presentation more in-depth, though (and other stuff as well) just to make sure I'm not going down a bad road here.

However, ignoring for the moment the problem of the dynamic cast, my understanding is that this is not done in a Data-Oriented way


Who cares? Use tools when you need them and don't try to slavishly apply them to every situation.

Even if you do want to write every last bit of your code using data-oriented design... that term doesn't say that you have to avoid every single cache miss ever no matter what. It just says that you should think about your data and how it's accessed when designing your architecture. If you've done that and haven't identified a verifiable issue that needs addressing, have a beer and move on to a real problem that needs your time and attention. smile.png

Point taken, definitely. A lot times I don't feel that I understand a concept unless I've implemented it, though. So the point of this exercise is "Understand Data-Oriented Design". Once I've done that, I'll feel ready to make decisions about where DOD makes sense.

Personally, I think we get too lost sometimes in the rigidity of the 'rules' of a given paradigm. It's the reason why you often see questions around here about "true" or "real" OOP. Instead, we should be viewing the rules as loose sets of guidelines that can help you achieve a specific end.

In the case of DOD, I see that the end goal is a more cache-friendly data layout. What that means in practice is going to vary from project to project. In some cases it may very well be possible to have a very clear distinction between data types such that you can keep everything in separate arrays and without sacrificing cache coherency in the slightest. In others, it may mean that some data ought to be interleaved, or it might mean a broader category of objects that group related data (like your Asteroids example with the xforms and motions).

The short of it is, in order to understand how best to layout your data, you need to understand how you're using it. IMO, you're absolutely on the right track and asking all the right questions. Just don't box yourself in by trying to fit into a rigid ideal, but rather adapt the paradigm to your particular use case.

Thanks very much for your help! Now that you mention it, I think this plan I'm referring-to will work pretty well with my project, because I'm going to have relatively few entity types, but dozens or hundreds of each. In a project where there are hundreds of entity types, and perhaps only a few of each, and perhaps even entities that dynamically change their composition, then this plan probably wouldn't work that well.

Then again, for an Asteroids clone, anything approaching a data-driven, data-oriented, component-entity system is hugely overkill. :p

But, it's good to have a small project to practice a new concept on.

This topic is closed to new replies.

Advertisement