Sign in to follow this  
Followers 0
wh1sp3rik

Compute Shader

4 posts in this topic

Hello,

 

i am new in compute shader programming. Unfortunately, there are no good book and other material on this topic, so i have lots of questions.

 

Because Compute shader uses lots of threads, is it possible to use compute shader for post effect instead of rendering quad screen ? will be this technique faster, since it's multithreaded ?

 

I tried few demos, where they using thread per pixel, usually 16x16 threads per group. This configuration seems to be fastest, why ? if i try 32x32 or 8x8 .. or different threads, it's always slower.

 

Could you recommend me some good website or book for Compute shaders ? ( mainly DirectCompute ).

 

Thank you very much

0

Share this post


Link to post
Share on other sites

You can certainly use a compute shader to implement a technique that's normally performed in a pixel shader using a full-screen quad. Rendering a full-screen quad will spawn a thread for each pixel, so it will also be massively multithreaded. In fact the same exact shader implemented as a pixel shader and compute shader will almost always be faster as the pixel shader version, since there is some overhead associated with using compute shaders. In general you have to make use of an optimization only possible with compute shaders (usually shared memory) in order for the compute shader version to be faster.

Choosing the optimal number of threads in a thread group is a balancing act. On one hand you need enough threads in a thread group to allow the hardware to hide latency from memory access. On the other hand having more thread groups can allow the shader to better saturate the many cores present on a GPU. The best balance depends on the shader, the hardware, and what else is currently executing on the GPU. You should also keep in mind that the hardware will always launch threads in groups of threads known as warps (Nvidia) or wavefronts (AMD). A warp has 32 threads, while a wavefront has 64. If you pick a thread group size that isn't an even multiple of the warp/wavefront size, the hardware will round up the number of threads to the next multiple of the warp/wavefront side. Sticking with a multiple of 64 ensures that you won't waste threads when running on either architecture, but if you only run on Nvidia you can consider using a multiple of 32.

The book in my signature has a lot of material regarding compute shaders. You can also consider reading CUDA or OpenCL resources, since the overall concepts are very similar between the three platforms.

1

Share this post


Link to post
Share on other sites

Hello, thanks for information :)

 

I tried my first CS shader, just output to backbuffer, some colours based on threadID and groupID, it's working fine :)

I also found out, i can write to backbuffer only by using CS 5.0 version. How can i write to backbuffer using CS4.0 ? Should u use RWBuffer only instead of RWTexture2D ?

0

Share this post


Link to post
Share on other sites

You can't directly write to the backbuffer with CS4.0, since it can't write to textures. You'd have to write to a buffer, and then use a pixel shader to write the data to a texture.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0