Jump to content

  • Log In with Google      Sign In   
  • Create Account


Data Oriented Design: How to disable elements


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 Many   Members   -  Reputation: 100

Like
0Likes
Like

Posted 10 July 2011 - 01:48 PM

After some reading on Data Oriented Design (DOD) i have a question i can't find an answer for:

How to disable/cull elements in a Data Oriented flow?

Let say you have a game populated with some thousand 'objects'. Their data is put into linear arrays to allow to update them all at once, nicely decoupled from calls or callbacks.
Now you need to cull your 'objects' to the view frustum and most of them are not visible and should not anymore be processed.
How is this achieved using DOD (witout using an ugly 'Active Flag' which would make the whole DDO approach almost useles) ?

Sponsor:

#2 apatriarca   Crossbones+   -  Reputation: 1601

Like
0Likes
Like

Posted 10 July 2011 - 02:38 PM

One way to do it is using different arrays for input and output. To cull an object using this method you simply do not write the object in the output array.

EDIT: If you really want to do it in place, you can write on culled objects “shifting” all the following elements. This is quite simple since you only have to have separate writing and reading indexes.



#3 Many   Members   -  Reputation: 100

Like
0Likes
Like

Posted 10 July 2011 - 03:15 PM

I can't think of a case where Input/Output could be the same. Beside feedback effects, we usually want to transfer data, not override the source.
Anyway your reply doesn't fit my actual question, it just describes how to skip elements in a loop. My question was how this is handled/avoided in an DDO approach.

Maybe this active/inactive concept of elements in game doesn't work at all with DDO. When reading the "Culling the battlefield slides" from DICE then they just PROCESS ALLWAYS ANYTHING in their DDO culling.

#4 _swx_   Members   -  Reputation: 883

Like
0Likes
Like

Posted 10 July 2011 - 03:34 PM

The DICE slides clearly show IIRC that they have divided their world into a grid that they then perform the culling on. So for each visible grid cell, iterate over all the entities, culling them and either directly output rendering information or create a list of visible entities for later use.

#5 apatriarca   Crossbones+   -  Reputation: 1601

Like
0Likes
Like

Posted 10 July 2011 - 03:43 PM

Only the occluders which passed the test should go to the next steps of the pipeline. I don't see any reason to output the state of all occluders. The result of the occlusion culling shouldn't modify the occluders.

#6 Katie   Members   -  Reputation: 1281

Like
-1Likes
Like

Posted 10 July 2011 - 04:09 PM

"they just PROCESS ALLWAYS ANYTHING in their DDO culling."

That sentence makes no sense.

#7 Many   Members   -  Reputation: 100

Like
0Likes
Like

Posted 11 July 2011 - 02:16 AM

@Swx: Thanks for pointing on the grid approach but using a grid or not is not the point. At the end you get back an array of visible indices from the culling process. Now the real problem is how to linear update your visible elements (that together define the behaviour of a game object) based on that random access order you get back from culling. It seems like i really miss something obvious :)

#8 Hodgman   Moderators   -  Reputation: 27502

Like
0Likes
Like

Posted 11 July 2011 - 02:21 AM

Sort the resulting indices, and then iterate through the (sorted) index array. This gives you a linear-but-with-holes iteration of the visible objects.

If objects in the same areas are allocated at the same time, then they're likely to have similar indices, and they're also likely to have similar culling results -- so you're likely to skip over large parts of the array, and fully consume other parts of the array.


Also, ugly 'Active Flag' approaches don't make DOD completely useless... they still have their uses -- though in some cases you'd probably want to make a separate bit-array for storing them, so you can iterate the flags independently of the objects they're controlling. It all depends on your data -- if 1% of objects might get disabled, then active flags are probably ok. If 99% are likely to be disabled, then they're probably not a good idea. Know your data, know the statistics!

#9 Many   Members   -  Reputation: 100

Like
0Likes
Like

Posted 11 July 2011 - 04:11 AM

Hodgman: In a culling situation usually 99% gets inactive and the rest active. The sorting sounds like a good idea at first but in reality i don't think there will be performance benefits if the gap between objects is larger then your CPUs cache line.

As those are obvious problems i wonder why the are not answered in the battlefield slides or in other DOD papers.

#10 Hodgman   Moderators   -  Reputation: 27502

Like
3Likes
Like

Posted 11 July 2011 - 04:49 AM

The sorting sounds like a good idea at first but in reality i don't think there will be performance benefits if the gap between objects is larger then your CPUs cache line.

Performance benefits compared to what? Compared to grabbing the indexed data in a random order rather than in sorted order? Compared to typical OOP?

If we ignore pre-fetching etc and just look at the indexed iteration, the problem we've got is:
Whenever I dereference an index, I download more than one item into the cache.
How can I try to make use of these extra/free downloads?
The extra items that will be downloaded, will be either immediately before or after the index that was dereferenced.
So, by sorting indices, you improve the likelyhood of making use of an index that was previously downloaded to the cache.

A several-cache-line gap is fine, it's lots of sub-cache-line gaps that would cause you a problem (because you've spent the effort downloading that data, and then not used it).

Hodgman: In a culling situation usually 99% gets inactive and the rest active.

And the few that are active are likely to be allocated nearby each other in the array (or they should be...) -- the message of DOD is something like "design the data to fit the usage patterns". If your culling returns objects clustered by geographical location, then you should allocate those objects in the same physical region (so that the access pattern matches the physical structure in RAM).

For example, say we've got an array of items, where each item takes up 1/4 of a cache line:
ABCD|EFGH|IJKL|MNOP|QRST|UVWX|YZ

Let's say you're looking at the "middle" of your world, and the culling job returns the indices for:
MNORSTU
Then when you fetch M, you'll pull into the cache:
MNOP
...giving you hits for N and O.
Let's also say that during the iteration, you're issuing a prefetch command once every 4 indices, for 4 indices ahead. So while you're processing M, you're pre-fetching S, which pulls down the line containing:
QRST
...giving you hits for R, S and T.
Hopefully by the time you're done processing T, the hardware pre-fetcher has kicked in, and also pulled down:
UVWX
...giving you a hit for U.

Alternatively, seeing this entire example array fits into 7 cache lines (and your CPU cache holds, what, 1000 lines?), you could just greedily pre-fetch the whole data-set at the start of the function... Or you could greedily pre-fetch just the indices that you're interested in.

The specific approach taken is going to vary a lot depending on your target CPU, and the size and layout of your data-set.
On a SPU (which has a 256kb of private RAM), downloading the entire data-set from main-RAM into the SPU's local-store would definitely be an option. Alternatively, you could issue individual DMA commands for each index, giving you a tightly-packed version of the interesting items in local-store.


In the above example, the worst result you could get would be where every item is on a new cache line:
A|E|I|M|Q|U|Y
However, having large gaps in there is fine, such as:
ABCD|UVWX

As those are obvious problems i wonder why the are not answered in the battlefield slides or in other DOD papers.

The battlefield culling slides only cover the culling process, which outputs an array of indices. How those indices are consumed is outside the scope.
In general DOD papers, not every process outputs indices that reference into sparse arrays, often a compact data-buffer is output which can be directly consumed.
Christer Ericson's GDC 2003 presentation on memory optimisation might be of interest though.

#11 Many   Members   -  Reputation: 100

Like
0Likes
Like

Posted 11 July 2011 - 06:01 AM

Thanks Hodgman for the ideas. It seems i was looking for an too unrealistic 'optimal' performance pattern.

#12 Daniel Collin   Members   -  Reputation: 102

Like
1Likes
Like

Posted 16 July 2011 - 04:24 AM

Hi,

I ran across this thread so I might be able to give you some more info on the culling we use in BF3 (I implemented it) so please shoot some more questions.
As mentioned before in the thread the culling the presentation doesn't go into detail on how the systems that uses the data works but from the culling systems perspective that isn't interesting.

The system (as in most DOD) has one task and that is with the following in data: spheres (+ some extra data), frustums -> produce a list of visible objects (or entities as we call them) the list can be made very easy and in other case it's really: entityHandle (pointer to the entity containing the data (the culling system never reads this data, its just a value that gets passed on), etc, visibleMask (bitmask of which frustums the entity is visible in)

After that it's up to the other systems to decide what to do with the data. In our case we actually do some more culling before producing the very final output (software occlusion) but the idea is still the same.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS