Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 23 Apr 2003
Offline Last Active Yesterday, 02:41 AM

Posts I've Made

In Topic: D3D alternative for OpenGL gl_BaseInstanceARB

10 April 2016 - 03:47 AM

Hi, in my GDC 2016 talk, I discuss using the approach MJP mentioned to pass the draw index through to the indirect args using a root constant:





In Topic: Per Triangle Culling (GDC Frostbite)

24 March 2016 - 08:07 PM



I'm the author\presenter of this research. Hodgman is correct here. Triangle culling is definitely worth it, as evidenced by my initial slides showing peak primitive rate per triangle vs. available ALU. I mention cluster culling just to show we have it, but previous research (like the Siggraph 2015 GPU-Driven Rendering Pipelines work) shows cluster culling, so I wanted to take it further and detail per-tri instead. Combining per cluster with per triangle is significantly better than just doing per cluster. You can have lots of surviving clusters within your frustum that contain tiny triangles which will not be removed. The same goes for depth and frustum. Additionally, I go over some algorithms like blend shapes, cloth, or even voxelization which can't be done per-cluster, so this technique efficiently iterates per triangle, enabling the improved usage of these algorithms.


NV has different bottlenecks when talking about primitive rate, so async compute isn't the showstopper here (I can't get into the specifics due to NDA, but these techniques can be implemented differently on NV for a big win - i.e. fast passthrough geometry shaders on Maxwell+).


It comes down to, what is the primitive rate between setup vs. rasterizer. If it's the same rate, culling in compute will be faster. If setup is 2x the rate of rasterizer, you need more than 50% backface for it to be effective, and the gains will be less.


In a future blog post, I may show more details of the cluster culling that we're doing - though no promises yet :)


In summary, per-triangle culling is currently absolutely beneficial on AMD. Sure, just like most algorithms you can do a coarse\broad phase cull pass, and then do a fine\narrow phase cull phase. This talk\research is about how to perform the absolute fastest narrow phase cull on GCN.



In Topic: Anyone here a self-taught graphics programmer?

21 September 2013 - 11:49 PM

I'm super busy working on Battlefield 4, but this is an awesome thread, so I thought I'd link my own story (published recently on BioWare's blog).






In Topic: Asynchronous Asset Loading (data streaming)

17 November 2010 - 09:28 PM

Some great replies here so far. One suggestion I wanted to add is avoid doing look ups into your data manager by string. This pattern never scales (huge performance hit with a large catalog), uses a lot of memory, and often causes fragmentation. Instead, I would generate a hash of your asset names (32bit or 64bit, possibly working in a bucketed hash to handle collisions) and make your requests against hash values instead.


In Topic: Making a certain color in a texture 'clear' ?

05 January 2009 - 04:56 AM

What you want is color keying. For the most part, color keying is not hardware accelerated, and generally textures are preprocessed to remove keyed transparency pixels before being submitted to the GPU. Doing this on the GPU in hardware with a pixel shader is super easy and fast as well.

Check out the color keying sample on this page:


Hope that helps!