Sign in to follow this  

d3d11 frustum culling on gpu

This topic is 468 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I 'm going to implement frustum culling on gpu (not sure gs or cs ). 

First fill the consume structure buffer objects, and cull them, output to append structure buffer.

I 'm not sure is it efficient ?  and don't know the detail about how to use append/consume buffer ..

Any sample to learn ?

 

Share this post


Link to post
Share on other sites

world matrices of static instance are stored in cbuffer,  and I don't want to frustum cull these static object 

and modify that cbuffer, so cull them on gpu and  indirect draw may be better, I think .

Share this post


Link to post
Share on other sites

sorry to say I just looking for an aabb frustum culling on gpu implementation.

Something look like this:

struct InstData 
{
    matrix world_mat;
   aabb;
};
cbuffer CB_Inst 
{
  InstData insts[1000];
};
 
 
AppendStructureBuffer<uint> id_of_visable;
ConsumeStructureBuffer<uint> id_of_all_objs;
 
[thread]
void cs_cull() 
{ 
  uint id = id_of_all_objs[thread_id];
if( inFrustum( insts[id].aabb ))
   id_of_visable.append(id);
}
  
Edited by poigwym

Share this post


Link to post
Share on other sites

 

Um...

 

I think you may be missing the point of frustum culling...

what point?

 

usually to save tons of CPU work, to maintain streaming and manage memory. Frustum culling on CPU is quite efficient and you might not have any benefits of doing it on GPU unless you have some very specific case.

Share this post


Link to post
Share on other sites

Not sending a ton of unused data to the GPU is another part of it. It's less of an issue than it was a few years ago, but it's still pretty relevant. I guess I could see doing a sloppy CPU cull and then doing cleanup and clipping on the GPU, but I can't imagine that it would be more effective to upload render instructions for every object in the scene and then leave the GPU to sort it all out.

Share this post


Link to post
Share on other sites

I will try and find it but also if you are still interested, but a while ago (2 months) I saw a very complete tutorial on terrain generation, and frustrum culling in the GPU (including also LOD in there as well).  It was written for C#/Slimdx, but I cant find it at the moment and I have too many bookmarks :)

 

But it is out there, just have to find it... :)

Share this post


Link to post
Share on other sites

I will try and find it but also if you are still interested, but a while ago (2 months) I saw a very complete tutorial on terrain generation, and frustrum culling in the GPU (including also LOD in there as well).  It was written for C#/Slimdx, but I cant find it at the moment and I have too many bookmarks :)

 

But it is out there, just have to find it... :)

If you find it, please share it here !!

Share this post


Link to post
Share on other sites

So the link I found was on http://richardssoftware.net/Home/Post/29 (who posts on here actually a bit from what I can see)..

 

In there he has a some HLSL code to test if an AABB is inside or outside the frustrum.  The code itself is pretty clear.  This is what I remember reading about in relation to frustrum culling.

Share this post


Link to post
Share on other sites

GPU frustum culling is great, and only recently has it been reasonable to do with new api features. You could do it pre Vk/dx12 but it was kind of a headache for a lot of scenarios. 

 

Since running the culling in either a cs or gs is going to almost certainly mix up your pipeline state a little it probably doesn't matter a ton which you choose -- I'd assume compute would be faster but vendors might have made gs filter operations really fast? Probably not though. 

 

 

Not sending a ton of unused data to the GPU is another part of it. It's less of an issue than it was a few years ago, but it's still pretty relevant. I guess I could see doing a sloppy CPU cull and then doing cleanup and clipping on the GPU, but I can't imagine that it would be more effective to upload render instructions for every object in the scene and then leave the GPU to sort it all out.

 

Reducing latency can also be a big part of it. Imagine your command list builds a frustum from the most recent user input in gpu readable memory rather than being given a frustum by the host. Normally input is going to be at least as stale as however many frames were queued up in front of it. Instead the input will be only as stale as the time between when it was polled on the gpu and when its presented, less than a frame. But the only way this can work is if culling is also done on the gpu since it won't know what it's looking at until the command list is actually being executed.

Share this post


Link to post
Share on other sites

This topic is 468 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this