Data Flow Between Systems

Started by
3 comments, last by Hodgman 8 years, 7 months ago

Hello,

I was recently watching Mike Acton's talk at the CPP Con, regarding "Data Oriented Design", and I've been thinking a lot about how I might be able to structure my game code to better take advantage of cache coherency. The ideal here would be that all major processing is done on contiguous arrays of data, directly in-order.

But I've been struggling on one particular aspect of this: the problem is that the ordering requirement between various systems is not the same. For example, suppose I have a typical flow of data: I have high-level behavior that updates positions/rotations, which then needs to be transformed by collision and collision response, and then converted into matrix transforms for draw calls. There are distinct ordering requirements for each step here: behavior will want data grouped by type, collision will want it grouped by spatial locality, and rendering will want it grouped by draw order/material/etc. Beyond that, these requirements will change as the state of my game changes ( not referring to static objects here ).

This means that I cannot simply set up fixed arrays and run my processing in the same order across the board. Somehow, I have to convert between the different orderings as I go. Specifically I am considering the efficiency of the reads from the source, the writes to the destination, and then the reads at the destination. I have thought of three possibilities to do this:

Maintain the data in linked lists per-system

- This is what I have done for many years to solve this problem. It is easy to use, and there is no major processing required to maintain ordering. But the issue is, you end up iterating ( reads at the destination ) out of order, and you have all the node values filling up your cache.

Use linked lists for indexing and copy to arrays per-system

- I have just recently tried this out. The iteration at the destination is noticeably faster, but the copy ( writes to the destination ) is now out of order and takes longer. Overall, it has been only a minor speed boost for me.

Copy to arrays directly and sort per-system

- I have not tried this out yet, but I intend to. Theoretically this could have the least amount of cache misses as everything is done in order. It's hard to believe that the cost of running the sort algorithm would not outweigh the gains, but I'll have to test that.

Does anybody else out there have solutions that I am not thinking of? I'd be greatly interested to know how people tackle this kind of problem.

Thanks for reading!

Advertisement

The only approaches I can think of is to either duplicate the data to support multiple orderings (only the data you need to duplicate, since this is already expensive enough), or only optimize the ordering for the system where it matters the most (the rest can do indirect access).

I would think that most of the time there is just one system that is the bottleneck, so it should be enough to optimize for that system. Duplication has a lot of overhead (both RAM and CPU) so I would avoid that if its not strictly necessary.

If you do maintain multiple orderings, make sure to take advantage of temporal coherency in the order by choosing the correct sorting algorithm. The order is likely to be similar or even exactly the same as in the last frame, so dont throw that data away, and use an algorithm that has good complexity if data is already sorted (I think bubble sort had this property, even though it sucks with unsorted data).

o3o

Definitely just duplicate the data.

This is basically what you have to do anyway in many cases. Consider physics. Almost every engine uses an external physical library like Havok, PhysX, Bullet, Box2D, etc. Those libraries maintain their own data structures. When you modify a position (teleport* an object) you must inform the library that a body has been moved, which means copying the data into the physics library's data structures.

Far fewer games use third-party graphics libraries, though they certainly exist. The same design holds influence here, though: let your graphics code be an independent "library" that has zero dependencies on your game object model (so the core graphics code has no freakin' clue what an entity or a component is). Let graphics manage its data how it sees fit. All you ever do is copy in transform updates to the graphics library.

You may well end up with yet other specialized systems that need specialized data structures. They, also, should get copies.

Remember that this also helps massively with parallelism. There's no reason to worry about synchronizing graphics and physics if they're operating on entirely distinct copies of data. When physics is done simulating a tick, you can queue up a buffer of transform updates for graphics to consume next time that the graphics code is in a good place to do so. The same goes for your specialized AI systems and your general gameplay systems.

What you can do is make a generalized transform data structure that stores all the transforms in a packed and contiguous block of memory and provides accessors mapping from your general object ids to the transform indices. This at least lets you efficiency copy around those buffers of updates between modules. You'll still need to copy in to and out of the structure for your individual third-party libraries or specialized spatial structures, though.

* Never move an object explicitly. That is your physics library's job. You should almost never be copying positions into physics; you only ever copy them _out_ of physics.

Sean Middleditch – Game Systems Engineer – Join my team!

Thanks guys,

I am going with the assumption that I should duplicate the data, for the reasons you describe Sean. But my problem is the data needs to be in different orders for different systems, so when I duplicate it I end up having to jump all over memory to reorder it ( it's never a linear memcpy ). I think that's probably inevitable, and I am trying to consider the best way to do it.

@Waterlimon - That's an interesting thought, regarding keeping it sorted over multiple frames. However, although it's true that the order is unlikely to change, the data itself almost certainly will ( assuming these are all dynamic objects ), so I still need to re-write data into the sorted "slots" each frame. I can cache the sorted index to avoid running the algorithm, but that's almost identical to the second method I was discussing ( which never actually entails a sort ). Unless I am misunderstanding your suggestion?

There are distinct ordering requirements for each step here: behavior will want data grouped by type, collision will want it grouped by spatial locality, and rendering will want it grouped by draw order/material/etc.

Behaviour produces a list of (rigidBodyID, forceVector) pairs.
Physics consumes those, perhaps sorting them in place before applying those forces to the requested bodies. It then does position/velocity/acceleration updates it its own internal order, along with collision/etc.
Physics then produces an array of pairs of (renderableID,transformMatrix) pairs.
The rendering system then consumes that data to -
A) update the positions of it's bounding volumes for culling, and
B) streams them into cbuffers for use by the GPU.
The rendering system can then frustum cull it's bounding volumes, and use the resulting visibility data to get an array of visible objects.
It can then sort that array into material/etc order and submit draw-calls -- n.b. this step only needs to consume cbuffer ID's, not the transforms.

This topic is closed to new replies.

Advertisement