Jump to content

  • Log In with Google      Sign In   
  • Create Account

gwihlidal

Member Since 23 Apr 2003
Offline Last Active Apr 26 2016 08:07 PM

#5286108 D3D alternative for OpenGL gl_BaseInstanceARB

Posted by gwihlidal on 10 April 2016 - 03:47 AM

Hi, in my GDC 2016 talk, I discuss using the approach MJP mentioned to pass the draw index through to the indirect args using a root constant:

http://www.frostbite.com/2016/03/optimizing-the-graphics-pipeline-with-compute/

 

Cheers,

Graham




#5283303 Per Triangle Culling (GDC Frostbite)

Posted by gwihlidal on 24 March 2016 - 08:07 PM

Hi!

 

I'm the author\presenter of this research. Hodgman is correct here. Triangle culling is definitely worth it, as evidenced by my initial slides showing peak primitive rate per triangle vs. available ALU. I mention cluster culling just to show we have it, but previous research (like the Siggraph 2015 GPU-Driven Rendering Pipelines work) shows cluster culling, so I wanted to take it further and detail per-tri instead. Combining per cluster with per triangle is significantly better than just doing per cluster. You can have lots of surviving clusters within your frustum that contain tiny triangles which will not be removed. The same goes for depth and frustum. Additionally, I go over some algorithms like blend shapes, cloth, or even voxelization which can't be done per-cluster, so this technique efficiently iterates per triangle, enabling the improved usage of these algorithms.

 

NV has different bottlenecks when talking about primitive rate, so async compute isn't the showstopper here (I can't get into the specifics due to NDA, but these techniques can be implemented differently on NV for a big win - i.e. fast passthrough geometry shaders on Maxwell+).

 

It comes down to, what is the primitive rate between setup vs. rasterizer. If it's the same rate, culling in compute will be faster. If setup is 2x the rate of rasterizer, you need more than 50% backface for it to be effective, and the gains will be less.

 

In a future blog post, I may show more details of the cluster culling that we're doing - though no promises yet :)

 

In summary, per-triangle culling is currently absolutely beneficial on AMD. Sure, just like most algorithms you can do a coarse\broad phase cull pass, and then do a fine\narrow phase cull phase. This talk\research is about how to perform the absolute fastest narrow phase cull on GCN.

 

-Graham




#5095869 Anyone here a self-taught graphics programmer?

Posted by gwihlidal on 21 September 2013 - 11:49 PM

I'm super busy working on Battlefield 4, but this is an awesome thread, so I thought I'd link my own story (published recently on BioWare's blog).

 

http://blog.bioware.com/2013/07/25/staff-blog-graham-wihlidal-senior-software-engineer/

 

Cheers!

Graham




PARTNERS