• 12
• 12
• 9
• 10
• 13

# Is my frustum culling slow ?

This topic is 1815 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Doing frustum culling on 10.000 objects it takes about 0.85ms...is that slow/fast ?

I'm using AABBs to frustum cull my objects.

See code here:

bool RendererD3D11::FrustumCull(SceneEntityDescription* entity,
XMFLOAT4* frustumPlanes)
{
// If there even was a rejection last time
if(entity->lastRejectedFrustumPlane != 999999)
{
// Check last rejected planeId first
XMVECTOR planeNormal = XMVectorSet(frustumPlanes[entity->lastRejectedFrustumPlane].x, frustumPlanes[entity->lastRejectedFrustumPlane].y,
frustumPlanes[entity->lastRejectedFrustumPlane].z, 0.0f);

float planeConstant = frustumPlanes[entity->lastRejectedFrustumPlane].w;

// Check each axis (x, y, z) to get the AABB vertex furthest away from the direction
// the plane is facing (plane normal)
XMFLOAT3 axisVert;
XMFLOAT3 aabb_min = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Min();
XMFLOAT3 aabb_max = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Max();
XMFLOAT3 objPos = entity->worldPosition;

// x-axis
if(frustumPlanes[entity->lastRejectedFrustumPlane].x < 0.0f)
{
axisVert.x = aabb_min.x + objPos.x; // min x + obj position's x
}
else
{
axisVert.x = aabb_max.x + objPos.x; // max x + obj position's x
}

// y-axis
if(frustumPlanes[entity->lastRejectedFrustumPlane].y < 0.0f)
{
axisVert.y = aabb_min.y + objPos.y; // min y + obj position's y
}
else
{
axisVert.y = aabb_max.y + objPos.y; // max y + obj position's y
}

// z-axis
if(frustumPlanes[entity->lastRejectedFrustumPlane].z < 0.0f)
{
axisVert.z = aabb_min.z + objPos.z; // min z + obj position's z
}
else
{
axisVert.z = aabb_max.z + objPos.z; // min z + obj position's z
}

// Now we get the signed distance from the AABB's vertex that's furthest down the frustum planes normal,
// and if the signed distance is negative, then the entire bounding box is behind the frustum plane, which means
// that it should be culled
if(XMVectorGetX(XMVector3Dot(planeNormal, XMLoadFloat3(&axisVert))) + planeConstant < 0.0f)
{
return true;
}
}

// Loop through each frustum plane
for(int planeID = 0; planeID < 6; ++planeID)
{
if(entity->lastRejectedFrustumPlane != 999999)
{
if(planeID == entity->lastRejectedFrustumPlane)
{
// skip last rejected frustum plane since we've checked it before this loop
continue;
}
}

XMVECTOR planeNormal = XMVectorSet(frustumPlanes[planeID].x, frustumPlanes[planeID].y,
frustumPlanes[planeID].z, 0.0f);

float planeConstant = frustumPlanes[planeID].w;

// Check each axis (x, y, z) to get the AABB vertex furthest away from the direction
// the plane is facing (plane normal)
XMFLOAT3 axisVert;
XMFLOAT3 aabb_min = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Min();
XMFLOAT3 aabb_max = this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume()->Max();
XMFLOAT3 objPos = entity->worldPosition;

// x-axis
if(frustumPlanes[planeID].x < 0.0f)
{
axisVert.x = aabb_min.x + objPos.x; // min x + obj position's x
}
else
{
axisVert.x = aabb_max.x + objPos.x; // max x + obj position's x
}

// y-axis
if(frustumPlanes[planeID].y < 0.0f)
{
axisVert.y = aabb_min.y + objPos.y; // min y + obj position's y
}
else
{
axisVert.y = aabb_max.y + objPos.y; // max y + obj position's y
}

// z-axis
if(frustumPlanes[planeID].z < 0.0f)
{
axisVert.z = aabb_min.z + objPos.z; // min z + obj position's z
}
else
{
axisVert.z = aabb_max.z + objPos.z; // min z + obj position's z
}

// Now we get the signed distance from the AABB's vertex that's furthest down the frustum planes normal,
// and if the signed distance is negative, then the entire bounding box is behind the frustum plane, which means
// that it should be culled
if(XMVectorGetX(XMVector3Dot(planeNormal, XMLoadFloat3(&axisVert))) + planeConstant < 0.0f)
{
entity->lastRejectedFrustumPlane = planeID; // store rejected frustum plane ID for the next frames
return true;
}
}

return false;
}


Keep in mind these 10.000 objects are all inside the view frustum so they'll all go through all 6 planes and pass (worst case scenario I know).

I've commented out the draw call itself so the rendering itself has no influence on these numbers.

Edited by lipsryme

##### Share on other sites

Sounds resonable, at least my frustum culling implementation I implemented today takes approximately 0,53 ms to cull 10000 objects that are all visible. I didn't implement the last rejection optimization though. Thats is my code:

bool Plane::Inside(const BoundingBox& box) const
{
D3DXVECTOR3 vMin(box.m_vCenter.x - box.m_size, box.m_vCenter.y - box.m_size, box.m_vCenter.z - box.m_size), vMax(box.m_vCenter.x + box.m_size, box.m_vCenter.y + box.m_size, box.m_vCenter.z + box.m_size);
D3DXVECTOR4 vPos;
if(m_vNormal.x >= 0.0f)
{
vPos.x=vMax.x;
}
else
{
vPos.x=vMin.x;
}
if(m_vNormal.y >= 0.0f)
{
vPos.y=vMax.y;
}
else
{
vPos.y=vMin.y;
}
if(m_vNormal.z >= 0.0f)
{
vPos.z=vMax.z;
}
else
{
vPos.z=vMin.z;
}

vPos.w = 1.0f;

float posDot = D3DXPlaneDot(&m_plane, &vPos);

if(posDot < 0.0f)
return false;
else
return true;
}

bool Frustum::TestBox(const BoundingBox& box) const
{
for ( int i = 0; i < 6; i++ )
{
if ( !m_frustums.Inside(box) )
return false;
}
return true;
}


##### Share on other sites

The question miss details for a good answer. 10K box culled by a frustum in 0.5ms on PS3 is probably good, on a high end CPU is probably not :) If your final scene is below 10K primitives, it is good, if your final scene will contains millions of primitives, it will not be. If you can run the culling on a thread and doing something long enough before needing the result, even if it would last for 3ms, it is not a problem.  You may even run the frustum culling on GPU and do some draw indexed indirect without cpu sync.

The only thing i can tell is that your code is far from optimal by itself when considering batching a same process for a lot of entry ! Everything will depend on the data layout. Two paradigm exist, Array of structure (each box in the array is made of two points each made of x, y and z coordinate) and Structure of Array ( a container of N box split in 6 arrays of float ). By choosing the second one, you can do some SIMD programing, and divide the frustum test count by 4 for almost free. You can also try to remove branching with some value masking, ...

The C++ way with a Plane API to test a single point distance looks nice, but most of the times, if you want to test one point, you wants to test N points and there is certainly part of the computation mergeable.

Good luck :)

##### Share on other sites

I would suggest reading through this presentation.

##### Share on other sites
Well it's just for the pc, so x86 architecture. And I don't think I'll ever have even close to that many entities in my scene. I guess particles would be culled differently?

##### Share on other sites

Well it's just for the pc, so x86 architecture. And I don't think I'll ever have even close to that many entities in my scene. I guess particles would be culled differently?

If doing cpu simulated particles it is quite fast to cull them using some sort of sphere frustum test. The the test is just couple of dots and sometimes you can skip near and far planes.
I implemented this for all my particles and this give me substantial performance gain with ios. And code is not yet even using any SIMD's.

##### Share on other sites

Great article. I'm gonna have a look at that.

Edited by lipsryme

##### Share on other sites

I've got it implemented now but somehow my "negativeMask" always results to 0.

Do I have to transform center and extent to world space first ? I tried doing that using += objPos but still 0. Or do I need to do a multiply by the world matrix ?

UPDATE: Ah got it working! Of course I should not have added the world position to the extent

Well its down to ~0.4ms

bool RendererD3D11::FrustumCull(SceneEntityDescription* entity,
_Plane* frustumPlanes,
_Plane* absFrustumPlanes)
{
_Vector3f center = static_cast<AABB*>(this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume())->Volume().center;
_Vector3f extent = static_cast<AABB*>(this->contentManager->GetPrimitiveFromPool(entity->ID)->GetBoundingVolume())->Volume().extent;
_Vector3f objPos;
objPos.x = entity->worldPosition.x;
objPos.y = entity->worldPosition.y;
objPos.z = entity->worldPosition.z;

center.x += objPos.x;
center.y += objPos.y;
center.z += objPos.z;

// this was wrong...
//**************************
//* extent.x += objPos.x;
//* extent.y += objPos.y;
//* extent.z += objPos.z;
//**************************

for(unsigned int iPlane = 0;iPlane < 6;++iPlane)
{
__m128 xmm_d = _mm_mul_ss(xmm_aabbCenter_x, xmm_frustumPlane_Component);

__m128 xmm_r = _mm_mul_ss(xmm_aabbExtent_x, xmm_absFrustumPlane_Component);

// Shuffle d_p_r and d_m_r in order to perform only one _mm_movmask_ps
__m128 xmm_d_p_r__d_m_r = _mm_shuffle_ps(xmm_d_p_r, xmm_d_m_r, _MM_SHUFFLE(0, 0, 0, 0));

// Bit 0 holds the sign of d + r and bit 2 holds the sign of d - r
{
return true; // Outside
}
{
return false; // Intersect
}
}

return false;
}

Edited by lipsryme