# Culling dynamic objects effectively

## Recommended Posts

DirectXFreak    166
Greetings, I've been puzzling recently over an effective method for culling dynamic objects... By this, I could mean 1,000 asteroids randomly moving around the camera, or other similar instances. I've considered using an octree with uniform leaves, but this means I'd have to update every object with it's new cell as it's moved, which doesn't sound all that efficient to me. What algorithms can solve this problem effectively? Thanks!

##### Share on other sites
paic    645
Hi,

Google for "occlusion culling" It's a hardware accelerated method to cull dynamic object (can be used for static object but those can be precomputed into octrees / bsp / pvs / ...)
It basically draw a simplified version of each object (usually a bounding box) and get back the number of pixels drawn on screen. If this number is less than a threshold (1 or 2 pixels for example), you can say that the object is not visible and not draw it.

Sorry I can't really help more, I never implemented this.

##### Share on other sites
zedzeek    529
Quote:
 By this, I could mean 1,000 asteroids randomly moving around the camera

stick a BS around each asteroid and each frame check each asteroid against the camera frustum
u can also enable occlusion culling as well to refine the numbers (though with asteroids there prolly not gonna be much occlusion)

##### Share on other sites
GOCP    122
Quote:
 Original post by paicGoogle for "occlusion culling"

Not wanting to be picky, but did you mean Occlusion Querys?

Occlusion culling can cover software and hardware algorithms.

There are many differeing ways, however the "simplist" to start you off I guess are....

a) if the scene is small enough, use a regular grid accross it. Linear time lookups, trivial to implement, but obviously not adaptive to the number of items in the grid.

b) As mentioned bounding sphere around each asteroid. Be aware you might not see much performance increase if you're not geometry limited and the asteroids are simple. Of course instead of sphere you could use a bounding box or anything else you have a fast and easy Object in View Frustum test for.

c) Occlusion Querys though I doubt they'll save you much if anything on a per asteroid bases.

As mentioned there are many many more options out there, but to start I'd have a go with option b) and see if it makes any difference. If it does then look into more efficient methods (or post back here and we'll pull up something a bit more orientated to your problem)

Hope this helps,

GOCP

##### Share on other sites
paic    645
Oups, yes, you're right, I meant "hardware occlusion query" ^^
Sorry ^^

##### Share on other sites
eq    654
I'd recommend something like a loose dynamic aabb tree, I'm very happy with the performance of my implementation (even though it's not optiomized at all).

##### Share on other sites
DirectXFreak    166
So I take it just testing bounding spheres against the frustum is fast enough, even with that many objects?

##### Share on other sites
zedzeek    529
i just done a few tests
1,000,000 spheres against the frustum
15 MSEC best case (early out)
170 MSEC worse case
though u can prolly optimize even better than this

athlon64 2.0ghz

##### Share on other sites
eq    654
zedzeek: Those figures doesn't say much, since there's no info on how many spheres where visible by average.

I agree that if you ONLY need to do culling once per frame AND most of the objects are visible say 50% or more, then brute force is the winner.
If you have a smaller number of objects visible every frame < 10% OR you need to do several frustum checks OR collision queries, then a hierarchial data structure is the winner.
In the samples I provided in the thread referred to above, I did three frustum tests of 10000 moving boxes in 0.5 ms (Intel P4 @ 2.4 GHz), granted the number of visible boxes is probably only around 1%. Furthermore it's slower to update the position in a more advanced data structure (than an array of transforms).
I still thinks it's an invaluable tool to have a dynamic spatial data structure that can be used for different queries when creating real games.
It provides for very fast overlap tests (used for collision detection), raycasts, frustum culling, occlusion culling etc.
I've worked on games were we wanted to have dynamically updating envmaps for the player car, but since we didn't have a fast enough way of determining which objects are visible in each of the faces frustums we had to abandon the idea (the rendering was actually fast enough, though simplified).

Just my 2c

##### Share on other sites
zedzeek    529
Quote:
 Original post by eqzedzeek: Those figures doesn't say much, since there's no info on how many spheres where visible by average.

the stat of interest is the worse case (where all objs are within frustum) the OP mentioned 1,000 objs
on my computer athlon64 2.0 this will take 0.00017 of a second (most likely it will be even quicker cause A/ objs will also be outside the frustum B/ the method i used could be optimized further eg inlining funcs, sticking the far frustum plane test at the end etc

my intention to the OP is to save him/herself the heartache of implementing some spatial sceme (which takes time + adds complexity) and then get the shock of his/her life when the app doesnt run any quicker. sure there are places where spatial division will make a difference eg raytacing/coldet

FWIW in my game i have spatial divsion implemented but have it turned off cause in reality it makes no differnce to speed

##### Share on other sites
DirectXFreak    166
Thanks for the advise guys. I think I'll brute force it for now and optimize later (possibly using eq's method).

Thanks! =)
rating++ for both of you

##### Share on other sites
hplus0603    11356
Think about this: If the object moves every frame, and thus has to be updated in some data structure (octree, or what have you), then visiting the object in a vector of objects wouldn't be so bad. In fact, it may be more efficient than having to manage a dynamic tree. If you store an object as x,y,z position, r radius, and a pointer to the real object, then there's only 20 bytes per object, so 3 objects fit in a single cache line. Put these in a linear array, and it should cache and pre-fetch really well (you only need to take a cache miss on items you actually need to dereference the pointer for).

##### Share on other sites
eq    654
hplus0603: I agree with you.
If you move every object every frame and are only doing ONE frustum cull, no CD, raycasts etc, then it's impossible to beat a brute force approach.
I'd probably do something like:
Object* objects[10000]; // Max objectsfor every object  update object transform (+sphere centre)  if sphere isn't visible, continue loop  add object ptr to array of objectsendfor every visible object   draw objectend

The reasoning behind having two loops is that for the first one it's very easy to predict all memory access, thus caching it in an effective way is easy.
Doing the drawing inside this loop might bust the code cache and would certainly add a lot of memory accesses, thus trashing the caches.
Writing a bunch of pointers linearly into an array that's probably in the cache already is very fast.
When using this array in the second loop it's most probably in the L2 cache and maybe the L1 if the array is small enough (not that it matters since the draw call would probably mess the caches up).
Using the stack to store the pointers is not really necessary but the stack tends to be very near the cpu since it's used frequently and if you know the maximum number of objects at compile time it's quite nice.

My 2c

zedzeek: I guess I extrapolated the OP's needs to much. I was under the impression that he/she needed CD since I'm having a hard time coming up with an algorithm that could move asteroids randomly without them intersecting eachother (which IMO would look quite weird, but it might be fine for the OP's needs).
I also don't think that writing a dynamic spatial container isn't such a big problem, I wrote and debugged mine for about 3 days and as I proclaimed earlier is a tool that's a must have for making full 3d games (with CD, raycasts etc).

##### Share on other sites
DirectXFreak    166
eq: So about the section of code stating: add object ptr to array objects;
Would it be efficient to keep that array as a dynamic vector? (std::vector)
Or would the dynamic allocation trash the whole optimization?

Edit: *smacks forehead* I just saw the declaration of Object *objects at the top... that answered my question =P

##### Share on other sites
eq    654
Using a std::vector would work quite ok, I think (just use push_back).
You will probably get some more cache misses since every time an allocation is needed, the newly allocated memory is filled (using a copy of the previous).
It also depends on the stl implementations re-allocation strategy.
If you're lucky stl will try to grow the existing memory and if that succedes the performance is more or less the same (with respect of the cache).
But it seems impossible to me that ANY push_back implementation is faster than:

visibleObjects[visibleIndex ++] = items[itemIndex].m_object;

##### Share on other sites
Trienco    2555
Now I'm curious:
Can I expect any half decent compiler to do that anyway, or would I have to suspect it to end up more like *(visObj+visIndex++) = (items+itIndex)->object?

##### Share on other sites
WillC    548
Hi DirectXFreak,

In these sorts of situations where you have a lot of dynamic objects, it’s a good idea to try and avoid doing much work per object.

For object culling, I would definitely advise using some kind of spatial partitioning (be it an Octree, a fixed size cell, or some other method), so that you can do the culling tests at the cell level first. If a cell is rejected, then you have saved doing tests on All objects within that cell.

As you pointed out this requires maintaining a list of which objects are in which cells, but you can use frame coherence to simplify this. I.e. from one frame to the next an object is only likely to move within a cell, or to one of its neighbours (providing you balance the cell size with object sizes and speeds). The overhead of this is likely to be out-weighted by the gains you get by reducing the number of (comparatively expensive) visibility tests.

Also at some point you're going to need to start thinking about collisions (assuming you don't want all your asteroids to pass through each other). The naive brute force approach for this involves testing all objects against all others which is O(n^2) (i.e. bad for 1000 objects!), but you can apply the same cell partitioning and frame coherence techniques to collision tests too. One technique, which I believe was first presented in one of David Baraff’s papers on rigid body dynamics, is called ‘sort and sweep’ which maintains overlapping bounding boxes to produce a list of potentially colliding pairs of object.

In general these techniques are referred to as ‘Broad Phase’, and serve to minimise the number of objects that need to enter the more expensive ‘Narrow Phase’ of individual visibility or collision tests.

Hope that helps a bit. Google will give you more details on this if you need it.

##### Share on other sites
ToohrVyk    1596
What about the case where you have a lot of moving objects that
- are not updated every frame
- always move between frames

How can you perform culling without having to evaluate the position of objects (and comparing it to the frustum) if you can make certain hypotheses on the movement (linear, for instance) ?

##### Share on other sites
Basiror    241
Quote:
 Original post by eqUsing a std::vector would work quite ok, I think (just use push_back).You will probably get some more cache misses since every time an allocation is needed, the newly allocated memory is filled (using a copy of the previous).It also depends on the stl implementations re-allocation strategy.If you're lucky stl will try to grow the existing memory and if that succedes the performance is more or less the same (with respect of the cache).But it seems impossible to me that ANY push_back implementation is faster than:visibleObjects[visibleIndex ++] = items[itemIndex].m_object;

just reserve enough elements once before you use the vector

and btw. you don t check the array boundaries which might work in 99% of the time if the array is large enough
push_back does this for you and you save the time writing that code

@freak: a hierarchical approach seems to be the most efficient appraoch as long as you asteroids are evenly distributed in space and the camera frustrum doesn t cover the whole space at once