# d3d11 frustum culling on gpu

## Recommended Posts

I 'm going to implement frustum culling on gpu (not sure gs or cs ).

First fill the consume structure buffer objects, and cull them, output to append structure buffer.

I 'm not sure is it efficient ?  and don't know the detail about how to use append/consume buffer ..

Any sample to learn ?

##### Share on other sites

Um...

I think you may be missing the point of frustum culling...

##### Share on other sites

Um...

I think you may be missing the point of frustum culling...

what point?

##### Share on other sites

world matrices of static instance are stored in cbuffer,  and I don't want to frustum cull these static object

and modify that cbuffer, so cull them on gpu and  indirect draw may be better, I think .

##### Share on other sites

sorry to say I just looking for an aabb frustum culling on gpu implementation.

Something look like this:

struct InstData
{
matrix world_mat;
aabb;
};
cbuffer CB_Inst
{
InstData insts[1000];
};

AppendStructureBuffer<uint> id_of_visable;
ConsumeStructureBuffer<uint> id_of_all_objs;

void cs_cull()
{
if( inFrustum( insts[id].aabb ))
id_of_visable.append(id);
}


Edited by poigwym

##### Share on other sites

Um...

I think you may be missing the point of frustum culling...

what point?

usually to save tons of CPU work, to maintain streaming and manage memory. Frustum culling on CPU is quite efficient and you might not have any benefits of doing it on GPU unless you have some very specific case.

##### Share on other sites

wow !! I think I 'm in the wrong way ....

##### Share on other sites

Not sending a ton of unused data to the GPU is another part of it. It's less of an issue than it was a few years ago, but it's still pretty relevant. I guess I could see doing a sloppy CPU cull and then doing cleanup and clipping on the GPU, but I can't imagine that it would be more effective to upload render instructions for every object in the scene and then leave the GPU to sort it all out.

##### Share on other sites

I will try and find it but also if you are still interested, but a while ago (2 months) I saw a very complete tutorial on terrain generation, and frustrum culling in the GPU (including also LOD in there as well).  It was written for C#/Slimdx, but I cant find it at the moment and I have too many bookmarks :)

But it is out there, just have to find it... :)

##### Share on other sites

I will try and find it but also if you are still interested, but a while ago (2 months) I saw a very complete tutorial on terrain generation, and frustrum culling in the GPU (including also LOD in there as well).  It was written for C#/Slimdx, but I cant find it at the moment and I have too many bookmarks :)

But it is out there, just have to find it... :)

If you find it, please share it here !!

##### Share on other sites

So the link I found was on http://richardssoftware.net/Home/Post/29 (who posts on here actually a bit from what I can see)..

In there he has a some HLSL code to test if an AABB is inside or outside the frustrum.  The code itself is pretty clear.  This is what I remember reading about in relation to frustrum culling.

##### Share on other sites

GPU frustum culling is great, and only recently has it been reasonable to do with new api features. You could do it pre Vk/dx12 but it was kind of a headache for a lot of scenarios.

Since running the culling in either a cs or gs is going to almost certainly mix up your pipeline state a little it probably doesn't matter a ton which you choose -- I'd assume compute would be faster but vendors might have made gs filter operations really fast? Probably not though.

Not sending a ton of unused data to the GPU is another part of it. It's less of an issue than it was a few years ago, but it's still pretty relevant. I guess I could see doing a sloppy CPU cull and then doing cleanup and clipping on the GPU, but I can't imagine that it would be more effective to upload render instructions for every object in the scene and then leave the GPU to sort it all out.

Reducing latency can also be a big part of it. Imagine your command list builds a frustum from the most recent user input in gpu readable memory rather than being given a frustum by the host. Normally input is going to be at least as stale as however many frames were queued up in front of it. Instead the input will be only as stale as the time between when it was polled on the gpu and when its presented, less than a frame. But the only way this can work is if culling is also done on the gpu since it won't know what it's looking at until the command list is actually being executed.

## Create an account

Register a new account

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627676
• Total Posts
2978582

• 11
• 12
• 10
• 12
• 22