Compute Shader

Graphics and GPU Programming Programming

Started by wh1sp3rik March 28, 2013 10:51 PM

3 comments, last by MJP 11 years ago

wh1sp3rik

250

Author

March 28, 2013 10:51 PM

Hello,

i am new in compute shader programming. Unfortunately, there are no good book and other material on this topic, so i have lots of questions.

Because Compute shader uses lots of threads, is it possible to use compute shader for post effect instead of rendering quad screen ? will be this technique faster, since it's multithreaded ?

I tried few demos, where they using thread per pixel, usually 16x16 threads per group. This configuration seems to be fastest, why ? if i try 32x32 or 8x8 .. or different threads, it's always slower.

Could you recommend me some good website or book for Compute shaders ? ( mainly DirectCompute ).

Thank you very much

DirectX 11, C++

Steve_Segreto

2,094

March 29, 2013 04:27 AM

Have you tried https://developer.nvidia.com/directcompute?

Love DAOC? Tryout my DAOC clone: https://dl.dropboxusercontent.com/u/8974528/VON_Dist.zip

MJP

20,295

March 29, 2013 05:31 AM

You can certainly use a compute shader to implement a technique that's normally performed in a pixel shader using a full-screen quad. Rendering a full-screen quad will spawn a thread for each pixel, so it will also be massively multithreaded. In fact the same exact shader implemented as a pixel shader and compute shader will almost always be faster as the pixel shader version, since there is some overhead associated with using compute shaders. In general you have to make use of an optimization only possible with compute shaders (usually shared memory) in order for the compute shader version to be faster.

Choosing the optimal number of threads in a thread group is a balancing act. On one hand you need enough threads in a thread group to allow the hardware to hide latency from memory access. On the other hand having more thread groups can allow the shader to better saturate the many cores present on a GPU. The best balance depends on the shader, the hardware, and what else is currently executing on the GPU. You should also keep in mind that the hardware will always launch threads in groups of threads known as warps (Nvidia) or wavefronts (AMD). A warp has 32 threads, while a wavefront has 64. If you pick a thread group size that isn't an even multiple of the warp/wavefront size, the hardware will round up the number of threads to the next multiple of the warp/wavefront side. Sticking with a multiple of 64 ensures that you won't waste threads when running on either architecture, but if you only run on Nvidia you can consider using a multiple of 32.

The book in my signature has a lot of material regarding compute shaders. You can also consider reading CUDA or OpenCL resources, since the overall concepts are very similar between the three platforms.

The Blog | The Book

wh1sp3rik

250

Author

March 29, 2013 10:07 AM

Hello, thanks for information :)

I tried my first CS shader, just output to backbuffer, some colours based on threadID and groupID, it's working fine :)

I also found out, i can write to backbuffer only by using CS 5.0 version. How can i write to backbuffer using CS4.0 ? Should u use RWBuffer only instead of RWTexture2D ?

DirectX 11, C++

MJP

20,295

March 29, 2013 06:10 PM

You can't directly write to the backbuffer with CS4.0, since it can't write to textures. You'd have to write to a buffer, and then use a pixel shader to write the data to a texture.

The Blog | The Book

Compute Shader

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Compute Shader

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines