Data oriented design

Started by
3 comments, last by Randy Gaul 9 years, 1 month ago

Hey you cool code crunchers!

I'm trying to wrap my head around data oriented design. There are many good explanations and examples out there, but I would love to verify that my understanding is correct before making changes to my old code.

So far I have been trying to keep my data simple by avoiding inheritance and using only pods, std::vectors and nested structs. As a very simplified example:

struct State
{
struct Unit
{
I32 maxHealth;
I32 health;
Float rotationAngle;
Vector3 position;
};
struct Player
{
std::vector<Unit> units;
};
std::vector<Player> players;
};

My first intuition would be to redefine the data like this:

struct State
{
struct Unit
{
I32 maxHealth;
I32 health;
I32 positionVector3Index;
Float rotationAngle;
};
struct Player
{
std::vector<I32> unitIndices;
};
std::vector<Player> players;
IndexedVector<Unit> units;
IndexedVector<Vector3> vector3s;
};

IndexedVector is just an std::vector that secures element indices by flagging vacant slots (unless there's a sequence at the end, at which point deallocation occurs). I'm sure there exists a refined variation somewhere.

Does this look right, or should I separate even pods of different types but equal memory sizes, such as 32 bit integers and floats? It seems to me like when an index would occupy the same or more memory than the referenced data type, separation wouldn't be a good idea. Thank you very much for any tips!

Ps. I managed to close my browser while writing this, so the post had to be restored from memory. Sorry if I lost some point along the way.

- David

Advertisement

You have the idea down, yes. Though you should try not to worry so much about how to separate your data before you try things out. It will be more important to have your code stay simple, readable, and easy to modify in the future.

Once you start coming into performance problems in practice you can apply a Data Oriented way of thinking to try to find and fix the problem. This will give you valuable experience you can draw on the next time you write code, helping you to plan ahead more appropriately. But for now since you may not have that experience it can possibly be a big waste of time to try and "guess" exactly how to organize your data.

Hello Randy! I simplified one behemoth of a structure for the purpose of this post. In reality it is part of a quite elaborate (and mostly finished) project that I got some critique for when I passed a code sample to a potential employer. I suppose it might be a good start to unite objects contained in vectors and apply the changes step by step, not necessarily going "all the way".

Ironically, it's hard to tell if you're on the right track because you've only shown us the data!

What transformations do you apply to that data?
How does that data change?
What produces it? What consumes it?

The answers to those questions will dictate the best way to organise the data.

As is, there's no processes associated with your data, so the optimal thing to do is delete all your structs for being unused :lol: (j/k)

[edit]re:
IndexedVector<Vector3> vector3s;
^^^ Having a collection of vec3s called "vec3s" doesn't seem to useful, as you can't perform any operations/transforms on data unless you know what it represents.

Having:
IndexedVector<Vector3> positions;
...makes more DoD sense, as then you can have a process that iterates through each position, tests if it's on screen, and outputs a list of unit ID#s representing the visible units, etc...

Hodgman is right. In order to give more specific advice the details of the problem need to be known. If how the data is going to be used is known, then it can be more clear how to perform data transformations.

You can start by asking "when does this data need to be accessed", "what transformations do I need to perform on this data", "what kind of output do these transformations create". Then you can start to really reason about what pieces of data should be packed together, or processed separately.

In general you just want the code to loop over arrays. The loops would ideally have no branching and not create too much register pressure. On top of this, you want to use (ideally) 100% of every single cache line brought in from memory.

This topic is closed to new replies.

Advertisement