Jump to content
  • Advertisement
Sign in to follow this  
Quat

Compute Shader

This topic is 3271 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So I was looking at a presentation and they did a Gaussian Blur and showed that a compute shader (CS) approach saved memory bandwidth versus a pixel shader (PS) implementation. Although they ignored the texture cache saving memory bandwidth in the PS solution. Besides potential memory bandwidth savings, does anyone know other reasons the CS is useful for 3D graphics (I'm not talking about its general purpose applications). I'm thinking of porting a 2D wave equation solver from pixel shader to compute shader, but am wondering how good the advantages will be. Is it just the texture bandwidth savings by not having to hit a texel multiple times when doing the finite difference? Also, is there extra overhead when rendering to a texture with a compute shader and then feeding it back into the rendering pipeline, or is it as fast as rendering to a texture and then binding the texture as a shader resource? Last question, it seems there is one thread per pixel for doing the compute shader. Is this always the case?

Share this post


Link to post
Share on other sites
Advertisement
Just thought of one more question:

Does one need to balance group size and thread size? In other words, is it better to have fewer groups, each group with more threads, or more groups with less threads?

Share this post


Link to post
Share on other sites
More threads should scale better across hardware versions. For each hardware set there will be some minimum number of threads where if you go below it you'll be leaving hardware idle. Since you won't know what this number is as it's not available to you, it's best to use a large thread size whenever possible. You can look into ATI and NVIDIA hardware design which is available in CUDA and Stream documentation in order to get an idea of how high this number should be -- it's probably higher than you might expect. Outside of this suggestion, the number of groups doesn't matter.

Any memory/resource access caching available to the pixel shader will also be available to the compute shader. I'm not familiar with that sample, but if they moved data to shared memory before doing all of their accumulated blur reads then the performance might be very high since shared memory should have fantastic performance.

Compute shaders are very good for doing operations like editing memory in place. There are several samples of doing per pixel sorting of transparent objects using computer shader, or FFT post processing using computer shader. There is also benefit to be had in avoiding the work necessary to render a quad for post processing because you're able to tell the compute shader the exact number of groups/threads up front instead of drawing a quad, rasterizing it, etc.

I've seen demos with compute shader driving the lighting of a scene by rendering several thousand point lights using deferred rendering techniques with excellent performance.

compute shader is like any other stage in the pipeline so there will not be some hidden cost for using a compute shader just because it's a compute shader. Using the resulting resource of a compute shader calculation as input for the next task should behave like any other such scenario.

Since compute shader is programmable, you get to decide how many threads it will take to do the calculation for a single pixel. It entirely depends on how you want your algorithm to operate. When using a pixel shader, there is only ever one thread per sample/pixel depending on the scenario. So, no real difference here.

Keep in mind that you are leveraging the same set of hardware whether you are running a pixel shader or a compute shader or any other shader. Only the use of a fixed function feature provided only in a specific pipeline stage will have performance implications since the unified design means that things like texture reads should perform the same independent of shader stage.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!