Culling dynamic objects effectively

Started by
17 comments, last by Basiror 18 years, 4 months ago
Thanks for the advise guys. I think I'll brute force it for now and optimize later (possibly using eq's method).

Thanks! =)
rating++ for both of you
--m_nPostCount++
Advertisement
Think about this: If the object moves every frame, and thus has to be updated in some data structure (octree, or what have you), then visiting the object in a vector of objects wouldn't be so bad. In fact, it may be more efficient than having to manage a dynamic tree. If you store an object as x,y,z position, r radius, and a pointer to the real object, then there's only 20 bytes per object, so 3 objects fit in a single cache line. Put these in a linear array, and it should cache and pre-fetch really well (you only need to take a cache miss on items you actually need to dereference the pointer for).
enum Bool { True, False, FileNotFound };
hplus0603: I agree with you.
If you move every object every frame and are only doing ONE frustum cull, no CD, raycasts etc, then it's impossible to beat a brute force approach.
I'd probably do something like:
Object* objects[10000]; // Max objectsfor every object  update object transform (+sphere centre)  if sphere isn't visible, continue loop  add object ptr to array of objectsendfor every visible object   draw objectend

The reasoning behind having two loops is that for the first one it's very easy to predict all memory access, thus caching it in an effective way is easy.
Doing the drawing inside this loop might bust the code cache and would certainly add a lot of memory accesses, thus trashing the caches.
Writing a bunch of pointers linearly into an array that's probably in the cache already is very fast.
When using this array in the second loop it's most probably in the L2 cache and maybe the L1 if the array is small enough (not that it matters since the draw call would probably mess the caches up).
Using the stack to store the pointers is not really necessary but the stack tends to be very near the cpu since it's used frequently and if you know the maximum number of objects at compile time it's quite nice.

My 2c

zedzeek: I guess I extrapolated the OP's needs to much. I was under the impression that he/she needed CD since I'm having a hard time coming up with an algorithm that could move asteroids randomly without them intersecting eachother (which IMO would look quite weird, but it might be fine for the OP's needs).
I also don't think that writing a dynamic spatial container isn't such a big problem, I wrote and debugged mine for about 3 days and as I proclaimed earlier is a tool that's a must have for making full 3d games (with CD, raycasts etc).
eq: So about the section of code stating: add object ptr to array objects;
Would it be efficient to keep that array as a dynamic vector? (std::vector)
Or would the dynamic allocation trash the whole optimization?

Edit: *smacks forehead* I just saw the declaration of Object *objects at the top... that answered my question =P
--m_nPostCount++
Using a std::vector would work quite ok, I think (just use push_back).
You will probably get some more cache misses since every time an allocation is needed, the newly allocated memory is filled (using a copy of the previous).
It also depends on the stl implementations re-allocation strategy.
If you're lucky stl will try to grow the existing memory and if that succedes the performance is more or less the same (with respect of the cache).
But it seems impossible to me that ANY push_back implementation is faster than:

visibleObjects[visibleIndex ++] = items[itemIndex].m_object;



Now I'm curious:
How about *(visObjptr++) = itemsPtr->object;
Can I expect any half decent compiler to do that anyway, or would I have to suspect it to end up more like *(visObj+visIndex++) = (items+itIndex)->object?
f@dzhttp://festini.device-zero.de
Hi DirectXFreak,

In these sorts of situations where you have a lot of dynamic objects, it’s a good idea to try and avoid doing much work per object.

For object culling, I would definitely advise using some kind of spatial partitioning (be it an Octree, a fixed size cell, or some other method), so that you can do the culling tests at the cell level first. If a cell is rejected, then you have saved doing tests on All objects within that cell.

As you pointed out this requires maintaining a list of which objects are in which cells, but you can use frame coherence to simplify this. I.e. from one frame to the next an object is only likely to move within a cell, or to one of its neighbours (providing you balance the cell size with object sizes and speeds). The overhead of this is likely to be out-weighted by the gains you get by reducing the number of (comparatively expensive) visibility tests.

Also at some point you're going to need to start thinking about collisions (assuming you don't want all your asteroids to pass through each other). The naive brute force approach for this involves testing all objects against all others which is O(n^2) (i.e. bad for 1000 objects!), but you can apply the same cell partitioning and frame coherence techniques to collision tests too. One technique, which I believe was first presented in one of David Baraff’s papers on rigid body dynamics, is called ‘sort and sweep’ which maintains overlapping bounding boxes to produce a list of potentially colliding pairs of object.

In general these techniques are referred to as ‘Broad Phase’, and serve to minimise the number of objects that need to enter the more expensive ‘Narrow Phase’ of individual visibility or collision tests.

Hope that helps a bit. Google will give you more details on this if you need it.


What about the case where you have a lot of moving objects that
- are not updated every frame
- always move between frames

How can you perform culling without having to evaluate the position of objects (and comparing it to the frustum) if you can make certain hypotheses on the movement (linear, for instance) ?

Quote:Original post by eq
Using a std::vector would work quite ok, I think (just use push_back).
You will probably get some more cache misses since every time an allocation is needed, the newly allocated memory is filled (using a copy of the previous).
It also depends on the stl implementations re-allocation strategy.
If you're lucky stl will try to grow the existing memory and if that succedes the performance is more or less the same (with respect of the cache).
But it seems impossible to me that ANY push_back implementation is faster than:

visibleObjects[visibleIndex ++] = items[itemIndex].m_object;


just reserve enough elements once before you use the vector

and btw. you don t check the array boundaries which might work in 99% of the time if the array is large enough
push_back does this for you and you save the time writing that code

@freak: a hierarchical approach seems to be the most efficient appraoch as long as you asteroids are evenly distributed in space and the camera frustrum doesn t cover the whole space at once
http://www.8ung.at/basiror/theironcross.html

This topic is closed to new replies.

Advertisement