Are GPUs performing frustum clipping triangles/pixels in hardware?

Started by
4 comments, last by Hodgman 4 years, 11 months ago

If GPU performs frustum clipping(top, bottom, left, right) in hardware, is it performed at the level of pixel or it reconstructs triangles?
Is it performed after the pixel shader was completely executed or the pixel is discarded before PS_Main?

Basically, my question reduces to this:

Is it better, to first, perform triangle clipping in compute*, and only then, make the graphics engine raster the triangles that "survived"?
Or GPU would always do this for me and supposedly faster than me?



*in Compute or in Geometry Shader, it is irrelevant for my question.

Advertisement

On a conventional desktop GPU, pixels that are not on screen (more specifically, not in the viewport/scissor region) will not have their pixel shader executed at all. The triangles are clipped to screen edge as part of rasterization. However this does require complete vertex shader execution and rasterization and so it can be beneficial to cull groups of polygons before ever submitting them to the GPU. Note also that pixels can be computed fully, and then covered up by geometry in front of it that arrives later. 

On a tile based GPU (common in mobile devices), the complete set of opaque polygons is processed as a batch for a given render target and the polys are clipped not only against the screen edge but against each other. This creates close to zero percent overdraw (near perfect pixel shader utilization) regardless of what order things are drawn in.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

I had to deal with frustum culling recently,,,,,

For me it was better to do as much culling as I could on the CPU (I know that's not what you were asking, but anyway) before sending draw commands to the GPU. All my meshes were on the GPU side of course, because you can spin the camera around quickly,  but just the act of asking the GPU to render a lot of off screen meshes was a huge performance penalty.
 

Right now, i am reading the buffer of the vertex positions in Compute. In Compute i project them using the camera matrix and frustum clip them. And i push to an index buffer the triangles that survived. And for each triangle that survived i write the projected vertices to a second vertex buffer. The indices buffer tells the vertex shader what to read from the new vertex buffer. The new vertex buffer is not reordered. The vertices at the positions of the triangles that are not visible are having garbage data.

I think, i should add frustum clipping on the level of bounding boxes in Compute.

And i could let the old per triangle frustum clipping, because anyways, i have to completely read the vertex buffer in compute(i need that for another reasons unrelated to this tread).

Doing or not frustum clipping by myself, means i would just delete the code that i already wrote in compute. The compute that anyways reads the whole vertex buffer...hmmm...

...ok, talking to myself here, made me decide to keep the frustum clipping in Compute, because i will be pushing less to the index buffer and write less to the second vertex buffer.

And thanks to your answers, i will add frustum clipping on the bounding box levels to speed up the Compute.

If you have a large number of triangle that are going to get culled, you can end up bottle-necked by the fixed-function culling hardware, as it can only process a fixed number, N, triangles per clock cycle. In that situation, you can actually speed things up a bit by performing culling yourself in a compute shader before the vertex shader. You probably only need to be doing this stuff if you're using ridiculously high poly counts and/or not doing decent culling on the CPU. I would imagine this kind of thing to be more popular in GPU-driven scene-submission designs than in the typical CPU-driven scene submission style.

 

This topic is closed to new replies.

Advertisement