Per Triangle Culling (GDC Frostbite)

Adam Pawlowski · 2016-06-07T09:43:31

I came across this presentation and they are talking about using compute for per-triangle culling (standard backface culling, standard hi-Z). I'm not sure what exactly they are talking about. Is this meant to rewrite that part of the pipeline completely? and then when it comes time to draw, just disable backface culling and all the other built-in pipeline stages? I'm not getting why you would write a compute shader to determine what triangles are visible when the pipeline does that already. Even if you turn that stuff off, is this really that much better? Can you even tell the GPU to turn off Hi-Z culling? Slides 41-44 http://www.wihlidal.ca/Presentations/GDC_2016_Compute.pdf

Graphics and GPU Programming Programming

Started by dpadam450 March 21, 2016 10:05 PM

22 comments, last by Anteru 7 years, 10 months ago

Matias Goldberg

9,637

March 27, 2016 09:02 PM

I don't know why you insist that much on bandwidth.

Alright I ran your numbers, you've convinced me it isn't as big an issue as I thought it to be... but I'm hazy on one figure of yours.

edit - also didn't you forget to take into account the Hi-Z buffer bandwidth for per triangle depth culling?

Yes I did. I don't know the exact memory footprint, but 33.33% overhead (like in mipmapping) sounds like a reasonable estimate.

How did you get the 309MB per frame figure? When I did it I'm getting completely different numbers.
edit - specifically the 305MB number.

Thanks for pointing it out.

1.000.000 * 32 bytes = 30.51MB... dammit I added a 0 and considered 10 million vertices.
The 305MB came from 10 million vertices, not 1 million.
Well... crap.

For 10 million vertices it's 35MB of index data, not 3.5MB. But for 1 million vertices, it's 30.51 MB, not 305.5MB

It only makes it easier to prove. Like I said, at 1920x1080 there shouldn't be much more than 2 million vertices (since there would be one vertex per pixel). Maybe 3 million? Profiling would be needed
So if you provide a massive amount of input vertices (such as 10 million vertices), the culler will end up discarding a lot of vertices.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

Infinisearch

3,058

March 27, 2016 09:23 PM

Yes I did. I don't know the exact memory footprint, but 33.33% overhead (like in mipmapping) sounds like a reasonable estimate.

Are you talking about the generation of the Hi-Z buffer or the per triangle fetch.from the Hi-z buffer? The generation I presume... the per triangle lookup is 32bytes minimum.

edit - might be wrong about that, might be 16bytes per.

It only makes it easier to prove.

I know, that's why I came around to thinking bandwidth wouldn't be a problem except in like you said earlier games with tons of geometry.

Like I said, at 1920x1080 there shouldn't be much more than 2 million vertices (since there would be one vertex per pixel)

2 million tops visible... but overlapping geometry... it all depends how much coarse culling you're doing.

Thanks again I had a knee jerk reaction, running the numbers grounded me.

-potential energy is easily made kinetic-

Infinisearch

3,058

May 18, 2016 08:36 PM

For those interested it looks like AMD has updated their GeometryFX compute triangle filtering library with cluster based culling as well. FYI it uses AMD's proprietary AGS (AMD GPU Services) library so its platform specific. But its a nice source to learn from if interested.

edit - forgot link - https://github.com/GPUOpen-Effects/GeometryFX/

and just in case http://gpuopen.com/gaming-product/geometryfx/

-potential energy is easily made kinetic-

Anteru

148

June 07, 2016 09:43 AM

Hi, the author of GeometryFX here. Cluster culling is a win, but it totally depends on how well you can cluster triangles. In some cases, it can drastically cut down the geometry, but often, cluster culling only fixes the "most obvious" clusters and then the back-face culling will get rid of 40+% of the triangles. I've implemented a simple clustering algorithm for GeometryFX which is described here: http://gpuopen.com/geometryfx-1-2-cluster-culling/ It works well enough if you have large swaths of planar geometry, but it doesn't adapt nor does it re-order the input triangles.

Ideally, you'd like some clustering which tries to find planar patches with similar orientation on the source mesh, but that's much more complicated. On the other hand, if you integrate it with a mesh preprocessor which optimizes for vertex cache usage, it could happen for "free" as long as you take orientation into account when optimizing for cache. I haven't had time to investigate this further.

Ubisoft uses a different scheme where they pre-compute visibility per cube-map face for clusters of 64 triangles and store that in a bit mask. Which works well, especially as the clusters are small and chances are high all of it gets cullled. For GeometryFX, I've found that 64-sized triangle clusters are too small to get really good efficiency, so GeometryFX works with 256+ sized one - but this is also partly due to the GeometryFX design.

Per Triangle Culling (GDC Frostbite)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Per Triangle Culling (GDC Frostbite)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines