So I'm just barely getting into this, and I think it's a really neat way to think about things. I'm going to program a simple Asteroids-like game (more of a Lunatic Fringe clone) just to get my feet wet, and I'm pretty excited about it. I just have a couple of things that I need to have clarified about doing a component-system type arrangement with Data-Oriented Design.
My first question concerns how to organize the arrays of components. My first instinct is that there should just be one array of each type of component. For instance, there would only be one master array of transforms, and any game entity that owns a transform would just add their transform to this array. That way, any systems that need to affect the transforms can just iterate linearly through the whole array. Several drawbacks (none of them insurmountable) became immediately apparent:
- Entity deletion becomes more difficult. Not every entity needs every component, so the component arrays will all be of different sizes, and there is going to need to be some piece of code (in the Entity itself, perhaps) that keeps track of which components in which arrays belong to which entity. If it's keeping track via array indices, then these indices will have to be updated every time components are deleted from the arrays (assuming that components are always added to the end, and that deletion involves an actual erase-remove operation rather than just marking certain array elements as 'dead' and recycling them later).
- Not every system that works on transforms (say) needs to work on every transform. There may be certain systems that only need to perform operations on Asteroid transforms. This means that each component will have to identify the type of the entity to which it belongs, so that the system in question can check the type before operating on the component. Maybe this isn't such a bad thing. Conditional branches aren't incredibly expensive, and although this does mean you've potentially wasted a prefetch on an object that you don't need, doing nothing to an object is still faster than doing something.
So my next thought is that there should be a different component array for each Entity type that uses the component. In my surfing, I've come across several examples of code that looks like this:
struct Asteroids
{
std::vector<Transform> transforms;
std::vector<CollisionSphere> collisionSpheres;
// etc...
};
So, the entity type here is Asteroid, and one instance of the Asteroids class holds a list of all asteroids in the scene. If you have a system that performs operations on asteroid transforms, but perhaps not the transforms of some other entity types, then you can just feed the system asteroids.transforms, and the system can do it's usual linear thing. There are a couple of drawbacks, it seems, to this approach as well:
- This breaks up a single big array (per component) into N smaller arrays per component (where N is the number of entity types that use that component), and some of those arrays (like players.transforms) might be very small. Running a system on a series of tiny arrays is probably not much better than jumping around in memory. However, there would still be significant savings, I would imagine, if your world contains several entity types that exist in large numbers (and thus have large component arrays).
- It seems that, suddenly, there are going to be a lot of areas of code that are going to need to know about the concrete entity types, which would not had they been programmed in a more traditional OOP style. In traditional OOP, you often just have an array of EntityBase* and you just call a polymorphic Update method on each entity. The code that iterates through this array doesn't know or care what the concrete types are. But suppose we're doing a data-oriented entity/system/component architecture, and suppose we have a system that performs some operation on transforms, but we only want Asteroids and BlackHoles to use it. There is going to need to be some code that calls system.update(asteroids.transforms) and then calls system.update(blackholes.transforms). My intuition (and maybe this is a good intuition, or maybe it's brought on by an overdose of OOP-thinking) is that main, central pieces of glue code should not know about concrete types; the only code that should care about the behavior of concrete types are the concrete types themselves, and the central "glue" code that brings all of these types together should know only about the abstract classes, and be completely decoupled from the concrete types themselves.
Despite these potential drawbacks, I still feel that option #2 (having entity classes like Asteroids that have their own arrays of components, rather than using a master array for each component that all entity types must share) is far preferable. However, I am interested in some other opinions, and to see if perhaps someone has thought of a third way that is even better.
My second question (or perhaps more of an observation) is that it seems like you can't entirely avoid random-access of objects. Take collision detection and response. Suppose you have some CollisionDetectionSystem whose job it is to iterate through all of the entities and find out which ones are colliding. For each collision, it adds a Collision object to an array of collisions. Then comes the collision resolution step. Even if you have just one CollisionResolutionSystem, that system is going to have to read each Collision object, find out which entities were involved in the collision, and then find the appropriate components for these entities in some other list. This find operation is going to necessarily entail jumping hither and yon through the component arrays, modifying them as called-for by the collision response.
Is there a clever solution to that sort of situation, or is it just a performance weakness that is accepted on the basis that it's still faster than the traditional OOP method of always jumping all over memory for everything?