Is my frustum culling slow ?

lipsryme · 2013-04-02T01:24:24

Doing frustum culling on 10.000 objects it takes about 0.85ms...is that slow/fast ? I'm using AABBs to frustum cull my objects. See code here: bool RendererD3D11::FrustumCull(SceneEntityDescription* entity, XMFLOAT4* frustumPlanes) { // If there even was a rejection last time if(entity->lastRejectedFrustumPlane != 999999) { // Check last rejected planeId first XMVECTOR planeNormal = XMVectorSet(frustumPlanes[entity->lastRejectedFrustumPlane].x, frustumPlanes[entity->lastRejectedFrustumPlane].y, frustumPlanes[entity->lastRejectedFrustumPlane].z, 0.0f); float planeConstant = frustumPlanes[entity->lastRejectedFrustumPlane].w; // Check each axis (x, y, z) to get the AABB vertex furthest away from the direction // the plane is facing (plane normal) XMFLOAT3 axisVert; XMFLOAT3 aabb_min = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Min(); XMFLOAT3 aabb_max = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Max(); XMFLOAT3 objPos = entity->worldPosition; // x-axis if(frustumPlanes[entity->lastRejectedFrustumPlane].x < 0.0f) { axisVert.x = aabb_min.x + objPos.x; // min x + obj position's x } else { axisVert.x = aabb_max.x + objPos.x; // max x + obj position's x } // y-axis if(frustumPlanes[entity->lastRejectedFrustumPlane].y < 0.0f) { axisVert.y = aabb_min.y + objPos.y; // min y + obj position's y } else { axisVert.y = aabb_max.y + objPos.y; // max y + obj position's y } // z-axis if(frustumPlanes[entity->lastRejectedFrustumPlane].z < 0.0f) { axisVert.z = aabb_min.z + objPos.z; // min z + obj position's z } else { axisVert.z = aabb_max.z + objPos.z; // min z + obj position's z } // Now we get the signed distance from the AABB's vertex that's furthest down the frustum planes normal, // and if the signed distance is negative, then the entire bounding box is behind the frustum plane, which means // that it should be culled if(XMVectorGetX(XMVector3Dot(planeNormal, XMLoadFloat3(&axisVert))) + planeConstant < 0.0f) { return true; } } // Loop through each frustum plane for(int planeID = 0; planeID < 6; ++planeID) { if(entity->lastRejectedFrustumPlane != 999999) { if(planeID == entity->lastRejectedFrustumPlane) { // skip last rejected frustum plane since we've checked it before this loop continue; } } XMVECTOR planeNormal = XMVectorSet(frustumPlanes[planeID].x, frustumPlanes[planeID].y, frustumPlanes[planeID].z, 0.0f); float planeConstant = frustumPlanes[planeID].w; // Check each axis (x, y, z) to get the AABB vertex furthest away from the direction // the plane is facing (plane normal) XMFLOAT3 axisVert; XMFLOAT3 aabb_min = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Min(); XMFLOAT3 aabb_max = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Max(); XMFLOAT3 objPos = entity->worldPosition; // x-axis if(frustumPlanes[planeID].x < 0.0f) { axisVert.x = aabb_min.x + objPos.x; // min x + obj position's x } else { axisVert.x = aabb_max.x + objPos.x; // max x + obj position's x } // y-axis if(frustumPlanes[planeID].y < 0.0f) { axisVert.y = aabb_min.y + objPos.y; // min y + obj position's y } else { axisVert.y = aabb_max.y + objPos.y; // max y + obj position's y } // z-axis if(frustumPlanes[planeID].z < 0.0f) { axisVert.z = aabb_min.z + objPos.z; // min z + obj position's z } else { axisVert.z = aabb_max.z + objPos.z; // min z + obj position's z } // Now we get the signed distance from the AABB's vertex that's furthest down the frustum planes normal, // and if the signed distance is negative, then the entire bounding box is behind the frustum plane, which means // that it should be culled if(XMVectorGetX(XMVector3Dot(planeNormal, XMLoadFloat3(&axisVert))) + planeConstant < 0.0f) { entity->lastRejectedFrustumPlane = planeID; // store rejected frustum plane ID for the next frames return true; } } return false; } Keep in mind these 10.000 objects are all inside the view frustum so they'll all go through all 6 planes and pass (worst case scenario I know). I've commented out the draw call itself so the rendering itself has no influence on these numbers.

Graphics and GPU Programming Programming

Started by lipsryme March 29, 2013 05:02 PM

44 comments, last by lipsryme 11 years ago

lipsryme

1,522

Author

April 01, 2013 11:18 AM

Oh my god I've finally done it ....using the btAlignedArray from the bullet physics sdk.

And it is blazingly fast. With just the culling itself it takes about 0.03ms for 10k AABBs.

50k goes to around 0.53ms. 100k AABBs in 1.25ms.

Big thanks to everyone in here !

Portfolio/Blog: http://marcel-schindler.weebly.com

galop1n

1,046

April 01, 2013 11:33 AM

Remember, even the fastest frustum culling loop will be no match with hierarchical culing.

Most of a world is made of static geometry, so it is only a preprocess to clusterize it and merge the clusters AABB in some kind of hierarchical structure. An octree can be a good choice, each node split into 8 sub node ( neat, it is 2 times a 4 box simd culling ). You send have a usefull information for traversal, fully visible, invisible and partially culled. You can then push the primitives of the fully visible node without even culling for example.

And you seems to still miss understand the difference between type size and memory alignement, so study them again and again, also add bunch of assert each time to try to do an aligned load or write to catch bugs as soon as possible.

kalle_h

2,470

April 01, 2013 07:31 PM

Remember, even the fastest frustum culling loop will be no match with hierarchical culing.
Most of a world is made of static geometry, so it is only a preprocess to clusterize it and merge the clusters AABB in some kind of hierarchical structure. An octree can be a good choice, each node split into 8 sub node ( neat, it is 2 times a 4 box simd culling ). You send have a usefull information for traversal, fully visible, invisible and partially culled. You can then push the primitives of the fully visible node without even culling for example.

And you seems to still miss understand the difference between type size and memory alignement, so study them again and again, also add bunch of assert each time to try to do an aligned load or write to catch bugs as soon as possible.

Hierarchical can be overkill for many cases and the added overhead and complexity can be then net loss.
In Battlefield 3 culling paper their paraller brute force algorithm was 3 times faster than hierarchical, code size was 80% smaller and because of this simplicity further optimizations was easier.
http://dice.se/publications/culling-the-battlefield-data-oriented-design-in-practice/

galop1n

1,046

April 01, 2013 09:59 PM

I perfectly know about that talks but the 360/PS3 way is not anymore. Beeing an in order RISC processor was a pain in the ass for code size and branching. Things as needed as possible in frustum culling like the float compare instruction was really costly too. Add the need for simple data layout to not saturate the DMA communication with the SPUs and the battefield choice may be legitime in their context.

But today and tomorow's engine will again scale the number of primitives to manage and draw by a good amount. Hierarchical layout will strike back easely with the more modern hardwares working more efficiently with branching.

And being hierarchical do not means we have to go down to single primitive leaves, everything is in the balance between the overhead of the hierarchy and the raw test. May be instead of storing hundreds of primitive in leafs, we will store thousands and dispatch each leaf on separate thread. In the end, only profile session and testing will give the answer, an answer dependant of the context of each game.

And strangely, on a previous game i work on, with a lot of instances to manage, one optimisation i did was to strip the hierarchical split of the culling after it reach a too small size because it was more efficient. So no, i am not a pro hierarchical, i am just a pro performance :)

Sock5

162

April 02, 2013 01:20 AM

@OP - Could you share what hardware are you running this on?Those are some awesome results.

>removed<

lipsryme

1,522

Author

April 02, 2013 01:24 AM

Core i5 2.8ghz quad core (I think it's a nehalem)
Mind you the frustum cull was basically the only thing inside my render loop and the results might not have been 100% perfectly measured

Portfolio/Blog: http://marcel-schindler.weebly.com

Is my frustum culling slow ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Is my frustum culling slow ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines