Followers 0

# Just a couple of Data-Oriented Design questions.

## 12 posts in this topic

So I'm just barely getting into this, and I think it's a really neat way to think about things. I'm going to program a simple Asteroids-like game (more of a Lunatic Fringe clone) just to get my feet wet, and I'm pretty excited about it. I just have a couple of things that I need to have clarified about doing a component-system type arrangement with Data-Oriented Design.

My first question concerns how to organize the arrays of components. My first instinct is that there should just be one array of each type of component. For instance, there would only be one master array of transforms, and any game entity that owns a transform would just add their transform to this array. That way, any systems that need to affect the transforms can just iterate linearly through the whole array. Several drawbacks (none of them insurmountable) became immediately apparent:

1. Entity deletion becomes more difficult. Not every entity needs every component, so the component arrays will all be of different sizes, and there is going to need to be some piece of code (in the Entity itself, perhaps) that keeps track of which components in which arrays belong to which entity. If it's keeping track via array indices, then these indices will have to be updated every time components are deleted from the arrays (assuming that components are always added to the end, and that deletion involves an actual erase-remove operation rather than just marking certain array elements as 'dead' and recycling them later).
2. Not every system that works on transforms (say) needs to work on every transform. There may be certain systems that only need to perform operations on Asteroid transforms. This means that each component will have to identify the type of the entity to which it belongs, so that the system in question can check the type before operating on the component. Maybe this isn't such a bad thing. Conditional branches aren't incredibly expensive, and although this does mean you've potentially wasted a prefetch on an object that you don't need, doing nothing to an object is still faster than doing something.

So my next thought is that there should be a different component array for each Entity type that uses the component. In my surfing, I've come across several examples of code that looks like this:

struct Asteroids
{
std::vector<Transform> transforms;
std::vector<CollisionSphere> collisionSpheres;
// etc...
};


So, the entity type here is Asteroid, and one instance of the Asteroids class holds a list of all asteroids in the scene. If you have a system that performs operations on asteroid transforms, but perhaps not the transforms of some other entity types, then you can just feed the system asteroids.transforms, and the system can do it's usual linear thing. There are a couple of drawbacks, it seems, to this approach as well:

1. This breaks up a single big array (per component) into N smaller arrays per component (where N is the number of entity types that use that component), and some of those arrays (like players.transforms) might be very small. Running a system on a series of tiny arrays is probably not much better than jumping around in memory. However, there would still be significant savings, I would imagine, if your world contains several entity types that exist in large numbers (and thus have large component arrays).
2. It seems that, suddenly, there are going to be a lot of areas of code that are going to need to know about the concrete entity types, which would not had they been programmed in a more traditional OOP style. In traditional OOP, you often just have an array of EntityBase* and you just call a polymorphic Update method on each entity. The code that iterates through this array doesn't know or care what the concrete types are. But suppose we're doing a data-oriented entity/system/component architecture, and suppose we have a system that performs some operation on transforms, but we only want Asteroids and BlackHoles to use it. There is going to need to be some code that calls system.update(asteroids.transforms) and then calls system.update(blackholes.transforms). My intuition (and maybe this is a good intuition, or maybe it's brought on by an overdose of OOP-thinking) is that main, central pieces of glue code should not know about concrete types; the only code that should care about the behavior of concrete types are the concrete types themselves, and the central "glue" code that brings all of these types together should know only about the abstract classes, and be completely decoupled from the concrete types themselves.

Despite these potential drawbacks, I still feel that option #2 (having entity classes like Asteroids that have their own arrays of components, rather than using a master array for each component that all entity types must share) is far preferable. However, I am interested in some other opinions, and to see if perhaps someone has thought of a third way that is even better.

My second question (or perhaps more of an observation) is that it seems like you can't entirely avoid random-access of objects. Take collision detection and response. Suppose you have some CollisionDetectionSystem whose job it is to iterate through all of the entities and find out which ones are colliding. For each collision, it adds a Collision object to an array of collisions. Then comes the collision resolution step. Even if you have just one CollisionResolutionSystem, that system is going to have to read each Collision object, find out which entities were involved in the collision, and then find the appropriate components for these entities in some other list. This find operation is going to necessarily entail jumping hither and yon through the component arrays, modifying them as called-for by the collision response.

Is there a clever solution to that sort of situation, or is it just a performance weakness that is accepted on the basis that it's still faster than the traditional OOP method of always jumping all over memory for everything?

Edited by CDProp
0

##### Share on other sites

If you're writing code that needs to know the "entity type", you're already heading in the wrong direction.

I see that as a huge compromise / "code smell", like a dynamic_cast in an OOP system...

Agreed 100%. I don't like that option at all. I definitely see that as a drawback of the single master-array method.

An entity should be completely defined by its components/data. Why would code need to treat the Transform on an Asteroid differently than the Transform on some other object? Can you give an example?

The transform for each entity should itself behave in exactly the same way (i.e., it should define the position, rotation, and scale of the game entity for physics and rendering purposes). However, the operations performed on the transform might be different from one entity to the next. For instance, a player-controlled spaceship would have a transform that is ultimately influenced by button presses, an AI-controlled spaceship would have a transform that is ultimately altered by some AI logic, and an Asteroid entity would have a transform that is altered by physics (some simple drift physics using Euler Integration or whatever).

Now, I know that some people would say that these should all be separate components -- TransformComponent, ShipAIComponent, PlayerControlComponent, DriftPhysicsComponent, etc. -- and that an Asteroid behaves the way it does because it owns a TransformComponent and a DriftPhysicsComponent. When the Asteroid updates, it calls Update on each of its components. When  the DriftPhysicsComponent needs to update the transform, it needs to ask the Entity for the TransformComponent and use that interface to alter the transform, like so (naive example):

void DriftPhysicsComponent::Update(double dt)
{
vec3 dv = a*dt;
v += dv;

vec3 dx = v*dt;

TransformComponent& xform = dynamic_cast<TransformComponent&>(entity.GetComponent("Transform"));
const vec3& pos = xform.GetPos();
xform.SetPos(pos+dx);
}


However, ignoring for the moment the problem of the dynamic cast, my understanding is that this is not done in a Data-Oriented way. The first problem is that the Entities own the components. So, the components themselves might be spread hither and yon throughout the heap, and updating them necessarily entails jumping all over memory. This could be solved somewhat by storing the components in a pool or array somewhere else, and giving the entities a reference to the components rather than having them own the components themselves. However, since you are iterating over the Entities and then allowing the Entities to update their own components, you still have a problem with the update order, where the components (despite being stored in a contiguous array) are being updated according to when they are encountered in the update loop:

1. Update Entity 0 (Asteroid)
1. Update TransformComponent
2. Update DriftPhysicsComponent
2. Update Entity 1 (Asteroid)
1. Update TransformComponent
2. Update DriftPhysicsComponent
3. Update Entity 2 (Black Hole)
1. Update TransformComponent
2. Update DriftPhysicsComponent
4. Update Entity 3 (Enemy Ship)
1. Update TransformComponent
2. Update ShipAIComponent

Ok, so let's say we solve that problem by keeping the entities out of the loop (literally) -- don't iterate over entities. Iterate over the homogeneous arrays of components, and call Update on them directly. Now, you're actually updating the components in a sequential, cache-friendly way (to my understanding). However, there is still one problem -- there is a lot of cross-talk between components, as you can see from my naive dynamic cast implementation. Updating a DriftPhysicsComponent necessarily entails updating a TransformComponent, which may entail jumping around the TransformComponentArray a bit to find the TransformComponent that corresponds with the current DriftPhysicsComponent (although I'm sure that the sorting of these arrays can be optimized to minimize this problem).

So, I was reading this post, which argues that there should be a separation from the state of a component and the behavior of the component. The author uses the term Component to refer to a PODS-like struct with no behavior (so, a TransformComponent would be little more than a matrix, maybe with some GetPos/SetPos-like convenience methods) and he uses the term System to refer to an algorithm that acts on the component. So, a DriftPhysicsSystem, for example, would apply Euler integration to a TransformComponent. In this scheme, you'd feed the DriftPhysicsSystem an array of TransformComponents and it would loop through all of them, like so:

void DriftPhysicsSystem::Update(double dT, TransformComponentArray& xforms, MotionComponentArray& motions)
{
for(size_t i = 0, iCount = xforms.size(); i < iCount; ++i)
{
vec3 dv = motions[i].a*dT;
motions[i].vel += dv;

vec3 dx = motions[i].vel*dT;
xforms[i].pos += dx;
}
}


The only catch now is that you need to have these contiguous arrays of the same size, where the ith element of xforms belongs to the same entity as the ith element of motions. Also, it can only contain components belonging to entities for whom DriftPhysics applies, e.g. BlackHoles and Asteroids.

So what I was sort of leaning toward was the second solution, where entities are defined like this:

struct Asteroids
{
std::vector<TransformComponent> xforms;
std::vector<MotionComponentArray> motions;
// etc...
};

Asteroids asteroids;

And then, updating all of the asteroids would look something like this:

driftPhysics.Update(dT, asteroids.xforms, asteroids.motions);


Anyway, that is sort of a long-winded explanation, but I realized that perhaps I have a different (i.e., wrong) idea of how Data-Oriented component-entity architecture is supposed to work, and so I figured I ought to explain myself before there is any more confusion. Thank you very much for your patience, if you've read this far. Hopefully this gives a good idea of what I meant by certain behavioral "systems" applying to certain Transforms and not others.

As above, don't do this by iterating through all the transforms... Iterate through the asteroids or black holes! Either
* for each asteroid, fetch asteroid->transform, do stuff
or, as above
*1. for each asteroid, add asteroid->transform to list of handles
*2. for each item in list, fetch it, do stuff

I think I understand what you're saying, but I am struggling to understand (not your fault) how this confers the cache-friendly benefits promised by data-oriented design. In the first option you listed, where are these transforms stored? And how do I ensure that the transforms are going to be accessed in a cache-friendly way if I'm iterating over Asteroids instead of over Transforms? In your second option, I can see how this would create a nice, contiguous, homogeneous array of transform handles. However, if they are merely handles, and not the transforms themselves, then doesn't that just add a level of indirection to the actual transforms, which themselves might be stored all over the place in memory? Even if they're stored in some contiguous array elsewhere, how can I be sure that the handles will be in the right order? Sort them? Or ensure, in the first place, that the ordering of the TransformComponent array matches the ordering of the Entity array?

I would say the opposite (In all of OOP/CES/DOD). All of the components should be fairly isolated and self contained, knowing as little about the outside world as possible.

Absolutely, yes, but the way I usually do it is this: I have some high-level factory that knows how to piece together the various entities. It is game-specific and has intimate knowledge of what makes up an Asteroid or a BlackHole or whatever. However, this factory will return the objects as an EntityBase*, and then there will be some lower-level framework code that has to update and draw these entities, so it will iterate over them and call entity->Draw or entity->Update or whatever. This is what I was referring to by "glue" code.

But I guess I could keep that sort of system, even if I take the above approach with my Asteroids struct. I could just make Asteroids a class, and make the arrays private. Have Asteroids inherit from EntityBase, from which it gets a virtual Update method. From inside Asteroids::Update, have it call driftPhysics.Update(). In fact, at that point, I don't see the point in making DriftPhysics a class. It seems that a non-member function would probably work there, and it could be shared by BlackHoles.

Anyway, whether that's a wise design choice is a separate question I guess!  Any input you could provide would be awesome.

0

##### Share on other sites

Regarding your proposed solution of having the physics/transform components for objects of a certain type stored together... with the way you've described it so far, I have trouble immediately seeing how it would fit into everything cleanly

Yeah, one thing I don't really like about it is that I have these classes called Asteroids, BlackHoles, etc. I would prefer to just have an Entity class. No EntityBase, no hierarchy. All game objects are Entities, and how the Entity behaves, renders, etc., is fully defined by which Components it takes on. I think it wouldn't be too difficult, though, to generalize this idea a bit so that it has the best of both worlds. Maybe I could define something called an EntityType. It would have a list of ComponentBehaviors (like DriftPhysics) that would each update in a certain order. As above, an EntityType instance would contain arrays of data, where the ith element from each array corresponds to the ith Entity of that type. I would need a factory that could read in an EntityType specification, instantiate the arrays, instantiate the behaviors, and then bind the arrays to the behaviors. At that point, adding new Entity instances is simply a matter of pushing new data onto the arrays. For entity deletion, I could do something similar to the swap-delete thing in the Insomniac presentation you linked to (although I suppose this version of it would not be much different than erase-remove).

I should probably read that Insomniac presentation more in-depth, though (and other stuff as well) just to make sure I'm not going down a bad road here.

0

##### Share on other sites

However, ignoring for the moment the problem of the dynamic cast, my understanding is that this is not done in a Data-Oriented way

Who cares? Use tools when you need them and don't try to slavishly apply them to every situation.

Even if you do want to write every last bit of your code using data-oriented design... that term doesn't say that you have to avoid every single cache miss ever no matter what. It just says that you should think about your data and how it's accessed when designing your architecture. If you've done that and haven't identified a verifiable issue that needs addressing, have a beer and move on to a real problem that needs your time and attention.

Point taken, definitely. A lot times I don't feel that I understand a concept unless I've implemented it, though. So the point of this exercise is "Understand Data-Oriented Design". Once I've done that, I'll feel ready to make decisions about where DOD makes sense.

0

##### Share on other sites

Thanks very much for your help! Now that you mention it, I think this plan I'm referring-to will work pretty well with my project, because I'm going to have relatively few entity types, but dozens or hundreds of each. In a project where there are hundreds of entity types, and perhaps only a few of each, and perhaps even entities that dynamically change their composition, then this plan probably wouldn't work that well.

Then again, for an Asteroids clone, anything approaching a data-driven, data-oriented, component-entity system is hugely overkill. :p

But, it's good to have a small project to practice a new concept on.

0

##### Share on other sites
I'm not trying to use it for every piece of code. I'm trying to use it to speed up entity/component updates. I must say, what I'm asking seems like a totally reasonable request. That is, help me understand how data-oriented design is used in games. So far, I've gotten some good replies along with at least two lectures about how I shouldn't be using data-oriented design like a golden hammer. Is that what I'm doing? Because most of the reading I've done on the subject uses this sort of entity-component update as an example of precisely where DOD comes in handy. Is there some way in which I'm misapplying the concept? If so, it would most helpful if you would be explicit about it.
1

## Create an account

Register a new account