Octree Frustrum Culling

Started by
14 comments, last by ????????? ????? 8 years, 7 months ago

Hello, times came when i had to implement some culling into my engine. I started by reading some stuff and so far so good. I managed to create an octree which can effectivly divide my geometry's vertices (per mesh) into an octree. But here i have some Questions:

Currently i am using immediate mode to test if the culling works. And it does work, i am recursivly traversing the tree's nodes and testing against the node's bounding box if the frustum intersects with it. Repeat untill leaf is found and render that leaf's geometry (with immediate mode for testing) But now i would like to use VBO/VAO's to render the geometry in the leaves and here i see two options:

- 1.Each time the i traverse the tree push the leaves that intersect with the frustrum into a list then get the vertices out of them and construct a VBO and send it to be rendered.

- 2.Pre build a VBO for each leaf then just rebind for each visible leaf and render.

Anyway, which is the better option rebuilding a new vbo each time the frustum changes, or just rebinding multiple vbos (which is also expensive).

Also is this even the right approach , maybe i am missing something.


 
bool view_Culling(CFrustum& frusrum, COctree *node)
{
if (!node)
return false;

if (frusrum.CubeInFrustum(node->GetCenter().x, node->GetCenter().y, node- >GetCenter().z,(node->GetWidth()/2.0f)))
{
if (node->IsSubDivided())
{
view_Culling(frusrum, node->OctreeNodes()[TOP_LEFT_FRONT]);
view_Culling(frusrum, node->OctreeNodes()[TOP_LEFT_BACK]);
view_Culling(frusrum, node->OctreeNodes()[TOP_RIGHT_BACK]);
view_Culling(frusrum, node->OctreeNodes()[TOP_RIGHT_FRONT]);
view_Culling(frusrum, node->OctreeNodes()[BOTTOM_LEFT_FRONT]);
view_Culling(frusrum, node->OctreeNodes()[BOTTOM_LEFT_BACK]);
view_Culling(frusrum, node->OctreeNodes()[BOTTOM_RIGHT_BACK]);
view_Culling(frusrum, node->OctreeNodes()[BOTTOM_RIGHT_FRONT]);
}
else
{
//use prebuilded vbo here
//or push these vertices here in a list later build a dynamic vbo
//from them
node->DrawOctree(node);
return true;
}
}
return false;
}
Advertisement


- 2.Pre build a VBO for each leaf then just rebind for each visible leaf and render.

This, rebuilding a VBO all the time is not necessary (and slow). You need to save your mesh somewhere and a VBO means only that the mesh is (most properly) stored in videomemory. So, keep in there and rebind it, then render it.

Okay , but the mesh is devided between the leaves. so for each leaf of the octree i will have 1 VBO with some vertices which will represent part of the mesh. And ill only render the parts that are visible thus binding multiple vbos. And its better right ?

Why do your objects need to be polygon-divided by the octree? This sounds like a less-efficient variation on a BSP tree.


L. Spiro

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
You have misunderstood how culling with an octree works with rendering.
VBO’s have nothing to do with the octree.

First you update all objects in the scene. If an object has not moved, you don’t need to update it inside the octree. If it has, you reposition it inside the octree.
Then you use the octree against your frustum to find nodes of the octree that are visible.
For each visible node, you then test each object in that node (AABB against the frustum).
For each visible object, add it to the list to be drawn.


The VBO’s are part of the objects. There is no relationship between the octree and the VBO(’s) used to draw the object. The octree is only there to accelerate the process of selecting objects to draw.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


but the mesh is devided between the leaves. so for each leaf of the octree i will have 1 VBO with some vertices
You can store multiple meshes within one VBO. A VBO is just a memory allocation -- like malloc or new char[].

You don't have to draw the entire contents of a VBO -- you can specify an offset to start reading triangles from, and how many triangles to draw.

Also, you can use an index buffer.

In your situation, you can have the entire mesh stored in one VBO, and then every frame you would produce a new index buffer, which only contains the indices of the triangles that you would like to draw, and then issue one draw-call using all of those indices.

Alternatively, you don't need to dynamically create any data each frame -- just put all the triangles into one VBO, and all the indices into one IBO -- but with each 'leaf' occupying a different sub-section of that IBO. You then issue one draw-call per leaf, which specifies the IBO offset for that leaf, and number of vertices in the leaf.

@L. Spiro

I think you misunderstood what i meant. You are assuming that the octree is divided by AABB when its not.Even then if a Terrain AABB intersects the frustum i will have to draw the entire thing which can be humongus, still drawing invisible parts of it. Further more I dont want to determine if An object is visible i want to determine which parts of it are visible if its too big to render completely. Also I realize that the entire scene can be stuffed inside one octree and culled from there , but that adds rather complicated problems (with materials,textures,shaders etc). My idea and approach will be different

The idea is that i have some kind of a BVH(maybe octree of AABBs) which will contain the Bounding Volumes of each object in a given scene. Then i will perfrom frustum tests on that structure to determine the nodes (and respectivly the models in that node) that are visible or not. Those that are not visible are simply not drawn, Those that are visible have 2 options:

- They are Partly visible (their bounding volume is bigger than the frusum but still visible - Terrains,Big Buildings etc...) - Here is where the Octree from above comes into play. They will be tested against an Octree to determine exaclty which parts of them are visible. I wont go deep into my implementation but to clarify i can say that each object has modeldata property , modeldata can be reused among many objects. Each modeldata contains - vbo,ibo, vertices. Now it will contain prebuilt octree from the vertices so i can use it when i need to determine what to draw when the model is partly visible. I will probably test in object space since the vertices are already there and i will need to translate the frustum also in object space.

- They are entirely visible (their bounding volume is smaller than the frusum) - Directly sent to be drawn

Currently that is going to be applied to static , non-moving objects.

@Hodgman

I think thats what i am looking for. Using 1 VBO (which i already have prebuilt) and only extract the vertices i currently need at each test against the tree. Can you explain further , maybe with some codeblocks how would that work ? Thanks !

Edit: What structure should i use to store the bounding boxes of the objects . I assume i can group the boxes in huge chunks and test against that and discard the entire chunk if not visible . But how would one build that structure ?

I think you misunderstood what i meant.

With your further explanation it seems clearer that you are taking a wrong turn.

but that adds rather complicated problems (with materials,textures,shaders etc).

Materials (which includes textures and shaders) are no more a problem when using octrees than if you simply had a linear array of objects and rendered them naïvely. They have absolutely no relationship with the culling process.

Even then if a Terrain AABB intersects the frustum i will have to draw the entire thing which can be humongus, still drawing invisible parts of it.

Terrain is always divided into chunks (usually of, say, 257×257 vertices, smaller with LOD) and visible chunks are drawn.

Also I realize that the entire scene can be stuffed inside one octree and culled from there

This is the correct route to take. Your complications related to materials are fictional.

They are Partly visible (their bounding volume is bigger than the frusum but still visible - Terrains,Big Buildings etc...)

All large objects are subdivided. Terrain into chunks, buildings into rooms and halls, interiors and exteriors, each with LOD’s, etc. Further special-case culling takes place inside buildings to avoid drawing the exterior altogether (if you can’t see outside of the building then the sky, terrain, and all outdoor objects are immediately skipped and a second, smaller octree is used which spans only the inside of the building).

The idea is that i have some kind of a BVH(maybe octree of AABBs)

A BVH and an octree are separate structures—an octree is not a kind of BVH.



There is nothing wrong with how you plan to structure objects using multiple smaller objects. A model is made up of multiple meshes. Each can have its own AABB, plus the AABB for the entire model. And if the model’s AABB is entirely inside the frustum there is no need to test the smaller mesh AABB’s. This is standard.

But culling itself is not a free series of instructions. If your culling method is too complex then it becomes slower than not culling at all. It is better to check mesh AABB’s (the smaller parts of models) linearly one-by-one than to maintain, update, and traverse yet another octree.

There exists the scene octree (or BVH) and nothing more (unless you add the interior/exterior special case for being inside buildings mentioned above). There are not mini-octrees inside all of the models, as such a costly culling method for so few objects doesn’t make sense. They are however culled. Just one-by-one, no secondary acceleration structures (a BVH here might make sense only for static objects, but again you would need to test that it actually helps, because you are entering a granularity where it becomes questionable).

And finally, none of this has anything to do with materials. I don’t know why you even mentioned them. Culling and rendering are disjointed and separate operations. You build a list of renderables via the culling pass, then you sort them with a render queue, then you render them. Culling uses an object’s AABB, and an AABB has no idea what materials are, it simply defines a geometric space.



I think thats what i am looking for. Using 1 VBO (which i already have prebuilt) and only extract the vertices i currently need at each test against the tree. Can you explain further , maybe with some codeblocks how would that work ? Thanks !

This is unrelated to culling, but a model has a single VBO and each mesh has an IBO which references vertices inside of it. This isn’t a culling issue, but simply a good idea to avoid rebinding VBO’s etc. Based on your explanation, it doesn’t seem worth it to rebuild the IBO(’s) in any way related to culling. Subdividing meshes at run-time is overkill and simply not worth the cost just to avoid drawing a few off-screen triangles.



Edit: What structure should i use to store the bounding boxes of the objects . I assume i can group the boxes in huge chunks and test against that and discard the entire chunk if not visible . But how would one build that structure ?

Models and meshes have AABB’s. The AABB’s inside the models (the mesh AABB’s) are too numerous and small and should be ignored for this part of the discussion, as including them in the scene’s octree or BVH would cause update hell (unless they are purely static).
The scene’s spacial-partitioning scheme can be either a BVH or an octree, but care must be taken in how either is implemented for there to be any benefit. In either case, the finest granularity is at the model level, not the mesh level.

In the case of octrees, the nodes in the tree must be linear in memory. Cache misses while traversing the tree will otherwise become so significant that the octree itself causes performance to drop overall, even if half the objects in the scene are ultimately culled.
Additionally, octrees are memory-consuming and in a naïve implementation you can only afford to have a few levels in the tree. This is why a hybrid approach is more common, using an Efficient Instant-Insertion Quadtrees for the X and Z (horizontal plane) and a stack for the Y (vertical plane).

In the case of BVH’s performance will depend heavily on your construction strategy, and they can’t be rebuilt at run-time efficiently (can only contain static objects).
https://en.wikipedia.org/wiki/Bounding_volume_hierarchy#BVH_design_issues


Note that it is perfectly acceptable to have terrain be a special case and be culled differently from the rest of the scene. In fact terrain culling can be so wildly different that it can define the terrain itself.
http://research.microsoft.com/en-us/um/people/hoppe/gpugcm.pdf
http://www.flipcode.com/archives/article_geomipmaps.pdf


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Okay, well. Correct me if i am wrong your idea is to use each object's bounding box as a culling mechanism. And just yesterday i implemented something like tha. Using bullet physics (which i use in the engine for other purposes) i managed to implement frustum culling (it also supports occlusion culling but i wont dive into that just for now). So bullet maintains a dynamic AABB tree which contains the aabb of each model so i managed to make it cull the geometry on object basis.Bullet retuns a List of visible objects and the rest is just looping over them (here i may want to sort on some criteria to avoid state changes) Works nice and neat for about 6000 objects (about 2 500 000 triangles) getting stedy 500 FPS (and about a few millsecs per frame rendering). I guess i can easily get reasonable fps with x4 or even x5 times individual objects. Also perfromed a profiling and the most expensive call is to DrawElements, cause they are all individual objects (not batched), guess that's expected.

"Simply not worth the cost just to avoid drawing a few off-screen triangles"

I dont think that's very true. You consider only a terrain as a type of shape that can have dense amount of triangles , and that's just not true. Imagine i wanted to load a model type (for example the Sponza) it's heavy and huge and doesnt have "just a few off-screen triangles" (if you decide to use it as a building type). And sponza (and similarly huge geometry) is worth going deeper into some kind of spacial part. That was my itial idea , to keep another level of tree for models similar to that.

Why i said that it will present problems with materials and shaders ? Well , because what i meant is to stuff the entire scene into the tree based on the vertices not a AABBs. Then each node will become a mash between multiple objects and pain. But i think that's unnecessary. And besides the scope of the Q

Correct me if i am wrong

You are not wrong.

here i may want to sort on some criteria to avoid state changes

I said that. Step 2 is to sort via a render-queue.

You consider only a terrain as a type of shape that can have dense amount of triangles

That’s a bold statement.
96% of my post was written with standard models in mind, and mostly excluding terrain.

If you were reading it, thinking it was mostly aimed at terrain, you should read it again.

It is aimed at the most common and severe standard case of having a model with lots of meshes inside it. When I write my game engines my test data is usually vehicles not intended for use in games.

LSBRDF19.png

The car is 1 model with about 120 meshes. The hubcaps, tires, seats, engine, exhaust pipes, seat belts, etc.

This sounds exactly like your problematic cases, and what I described above is exactly how to approach it.

When the whole car is in view, don’t cull all the parts. 120 draw calls, but needless culling is skipped.

When only part of the car is in view, the 120 meshes that make up the car also need to be frustum-culled.

I tried many ways of doing this, and after many tests with many models settled upon what I have described in previous posts. BVH’s only work for static objects, but the car’s wheels can move. Octrees are overkill because when you have so many small parts crammed into such a space you’re going to end up drawing most of them, with the overhead of traversing and updating the octree.

A linear array of 120 parts was the best solution overall. It suits the first case (just drawing all 120 objects without culling) perfectly, nothing needs to be updated when models are animated, and there is virtually no overhead/cache-missing associated with octrees.

Sponza should be broken up into multiple meshes in the same way and the exact same system should work just as well.

However, as you noted, draw calls are the most expensive. This is why I said it is sometimes better to draw off-screen triangles than to cull. If Sponza were entirely a single draw call, it would likely be faster to draw the whole model all in one call, from any viewpoint, than to break it into parts, cull, and use multiple draw calls.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Well currently i am using assimp. And when loading the model i am loading all meshes in that model and grouping them all under 1 VBO and 1 IBO . So for the example of Sponza all meshes within the model are in 1 VBO for me. But what i am worrying about is sending vertices thru' the pipeline that are not visible at all. If your target machine is GPU bound suddently all those invisible vertices you send to the shader for calculations are becoming a problem.

Your approach is to devide the the model into meshes , cull the meshes if visible - thats a possibility , mine was about deviding the enire VBO into chunks in an octree.

What i understand you are saying is the following:

Draw the entire model if its completely visible (1 draw call)

if only parts of the model are visible - Draw those parts(meshes) of the model , rest is culled

That's what i am saying for the last 2 posts. The only difference is that my model wont be split into meshes but rather chunks in the octree.

....Octrees are overkill because when you have so many small parts crammed into such a space you’re going to end up drawing most of them

That's why you'r subdivisoun will stop at say certain(reasonable amount) amount of vertices

...

, with the overhead of traversing and updating the octree.
Currently that is going to be applied to static , non-moving objects.

What updates on the octree when i clearly stated it's not the case.

That’s a bold statement.
96% of my post was written with standard models in mind, and mostly excluding terrain.

Actually i was talking about your statement from before , not your post as a whole

"Simply not worth the cost just to avoid drawing a few off-screen triangles" - THAT

I do not agree. You cant say that because you may be sending 100% of the model when you are actually seeing 5%. If the calculations per vertex are expensive or you are GPU bound (saying that cause my GPU is not the greatest and doing extra stuff seems to be unnecessary

This topic is closed to new replies.

Advertisement