Data Oriented Design: How to disable elements

Started by
10 comments, last by daniel_collin 12 years, 9 months ago
After some reading on Data Oriented Design (DOD) i have a question i can't find an answer for:

How to disable/cull elements in a Data Oriented flow?

Let say you have a game populated with some thousand 'objects'. Their data is put into linear arrays to allow to update them all at once, nicely decoupled from calls or callbacks.
Now you need to cull your 'objects' to the view frustum and most of them are not visible and should not anymore be processed.
How is this achieved using DOD (witout using an ugly 'Active Flag' which would make the whole DDO approach almost useles) ?
Advertisement
One way to do it is using different arrays for input and output. To cull an object using this method you simply do not write the object in the output array.

EDIT: If you really want to do it in place, you can write on culled objects “shifting” all the following elements. This is quite simple since you only have to have separate writing and reading indexes.

I can't think of a case where Input/Output could be the same. Beside feedback effects, we usually want to transfer data, not override the source.
Anyway your reply doesn't fit my actual question, it just describes how to skip elements in a loop. My question was how this is handled/avoided in an DDO approach.

Maybe this active/inactive concept of elements in game doesn't work at all with DDO. When reading the "Culling the battlefield slides" from DICE then they just PROCESS ALLWAYS ANYTHING in their DDO culling.
The DICE slides clearly show IIRC that they have divided their world into a grid that they then perform the culling on. So for each visible grid cell, iterate over all the entities, culling them and either directly output rendering information or create a list of visible entities for later use.
Only the occluders which passed the test should go to the next steps of the pipeline. I don't see any reason to output the state of all occluders. The result of the occlusion culling shouldn't modify the occluders.
"they just PROCESS ALLWAYS ANYTHING in their DDO culling."

That sentence makes no sense.
@Swx: Thanks for pointing on the grid approach but using a grid or not is not the point. At the end you get back an array of visible indices from the culling process. Now the real problem is how to linear update your visible elements (that together define the behaviour of a game object) based on that random access order you get back from culling. It seems like i really miss something obvious :)
Sort the resulting indices, and then iterate through the (sorted) index array. This gives you a linear-but-with-holes iteration of the visible objects.

If objects in the same areas are allocated at the same time, then they're likely to have similar indices, and they're also likely to have similar culling results -- so you're likely to skip over large parts of the array, and fully consume other parts of the array.


Also, ugly 'Active Flag' approaches don't make DOD completely useless... they still have their uses -- though in some cases you'd probably want to make a separate bit-array for storing them, so you can iterate the flags independently of the objects they're controlling. It all depends on your data -- if 1% of objects might get disabled, then active flags are probably ok. If 99% are likely to be disabled, then they're probably not a good idea. Know your data, know the statistics!
Hodgman: In a culling situation usually 99% gets inactive and the rest active. The sorting sounds like a good idea at first but in reality i don't think there will be performance benefits if the gap between objects is larger then your CPUs cache line.

As those are obvious problems i wonder why the are not answered in the battlefield slides or in other DOD papers.
The sorting sounds like a good idea at first but in reality i don't think there will be performance benefits if the gap between objects is larger then your CPUs cache line.
Performance benefits compared to what? Compared to grabbing the indexed data in a random order rather than in sorted order? Compared to typical OOP?

If we ignore pre-fetching etc and just look at the indexed iteration, the problem we've got is:
Whenever I dereference an index, I download more than one item into the cache.
How can I try to make use of these extra/free downloads?
The extra items that will be downloaded, will be either immediately before or after the index that was dereferenced.
So, by sorting indices, you improve the likelyhood of making use of an index that was previously downloaded to the cache.

A several-cache-line gap is fine, it's lots of sub-cache-line gaps that would cause you a problem (because you've spent the effort downloading that data, and then not used it).

Hodgman: In a culling situation usually 99% gets inactive and the rest active.[/quote]And the few that are active are likely to be allocated nearby each other in the array (or they should be...) -- the message of DOD is something like "design the data to fit the usage patterns". If your culling returns objects clustered by geographical location, then you should allocate those objects in the same physical region (so that the access pattern matches the physical structure in RAM).

For example, say we've got an array of items, where each item takes up [sup]1[/sup]/[sub]4[/sub] of a cache line:
[font="Courier New"]ABCD|EFGH|IJKL|MNOP|QRST|UVWX|YZ[/font]

Let's say you're looking at the "middle" of your world, and the culling job returns the indices for:
[font="Courier New"]MNORSTU[/font]
Then when you fetch [font="Courier New"]M[/font], you'll pull into the cache:
[font="Courier New"]MNOP[/font]
...giving you hits for [font="Courier New"]N[/font] and [font="Courier New"]O[/font].
Let's also say that during the iteration, you're issuing a prefetch command once every 4 indices, for 4 indices ahead. So while you're processing [font="Courier New"]M[/font], you're pre-fetching [font="Courier New"]S[/font], which pulls down the line containing:
[font="Courier New"]QRST[/font]
...giving you hits for [font="Courier New"]R[/font], [font="Courier New"]S[/font] and [font="Courier New"]T[/font].
Hopefully by the time you're done processing [font="Courier New"]T[/font], the hardware pre-fetcher has kicked in, and also pulled down:
[font="Courier New"]UVWX[/font]
...giving you a hit for [font="Courier New"]U[/font].

Alternatively, seeing this entire example array fits into 7 cache lines (and your CPU cache holds, what, 1000 lines?), you could just greedily pre-fetch the whole data-set at the start of the function... Or you could greedily pre-fetch just the indices that you're interested in.

The specific approach taken is going to vary a lot depending on your target CPU, and the size and layout of your data-set.
On a SPU (which has a 256kb of private RAM), downloading the entire data-set from main-RAM into the SPU's local-store would definitely be an option. Alternatively, you could issue individual DMA commands for each index, giving you a tightly-packed version of the interesting items in local-store.


In the above example, the worst result you could get would be where every item is on a new cache line:
[font="Courier New"]A|E|I|M|Q|U|Y[/font]
However, having large gaps in there is fine, such as:
[font="Courier New"]ABCD|UVWX[/font]

As those are obvious problems i wonder why the are not answered in the battlefield slides or in other DOD papers.[/quote]The battlefield culling slides only cover the culling process, which outputs an array of indices. How those indices are consumed is outside the scope.
In general DOD papers, not every process outputs indices that reference into sparse arrays, often a compact data-buffer is output which can be directly consumed.
Christer Ericson's GDC 2003 presentation on memory optimisation might be of interest though.

This topic is closed to new replies.

Advertisement