Compute Shader

Started by
3 comments, last by MJP 11 years ago

Hello,

i am new in compute shader programming. Unfortunately, there are no good book and other material on this topic, so i have lots of questions.

Because Compute shader uses lots of threads, is it possible to use compute shader for post effect instead of rendering quad screen ? will be this technique faster, since it's multithreaded ?

I tried few demos, where they using thread per pixel, usually 16x16 threads per group. This configuration seems to be fastest, why ? if i try 32x32 or 8x8 .. or different threads, it's always slower.

Could you recommend me some good website or book for Compute shaders ? ( mainly DirectCompute ).

Thank you very much

DirectX 11, C++

Advertisement

Have you tried https://developer.nvidia.com/directcompute?

You can certainly use a compute shader to implement a technique that's normally performed in a pixel shader using a full-screen quad. Rendering a full-screen quad will spawn a thread for each pixel, so it will also be massively multithreaded. In fact the same exact shader implemented as a pixel shader and compute shader will almost always be faster as the pixel shader version, since there is some overhead associated with using compute shaders. In general you have to make use of an optimization only possible with compute shaders (usually shared memory) in order for the compute shader version to be faster.

Choosing the optimal number of threads in a thread group is a balancing act. On one hand you need enough threads in a thread group to allow the hardware to hide latency from memory access. On the other hand having more thread groups can allow the shader to better saturate the many cores present on a GPU. The best balance depends on the shader, the hardware, and what else is currently executing on the GPU. You should also keep in mind that the hardware will always launch threads in groups of threads known as warps (Nvidia) or wavefronts (AMD). A warp has 32 threads, while a wavefront has 64. If you pick a thread group size that isn't an even multiple of the warp/wavefront size, the hardware will round up the number of threads to the next multiple of the warp/wavefront side. Sticking with a multiple of 64 ensures that you won't waste threads when running on either architecture, but if you only run on Nvidia you can consider using a multiple of 32.

The book in my signature has a lot of material regarding compute shaders. You can also consider reading CUDA or OpenCL resources, since the overall concepts are very similar between the three platforms.

Hello, thanks for information :)

I tried my first CS shader, just output to backbuffer, some colours based on threadID and groupID, it's working fine :)

I also found out, i can write to backbuffer only by using CS 5.0 version. How can i write to backbuffer using CS4.0 ? Should u use RWBuffer only instead of RWTexture2D ?

DirectX 11, C++

You can't directly write to the backbuffer with CS4.0, since it can't write to textures. You'd have to write to a buffer, and then use a pixel shader to write the data to a texture.

This topic is closed to new replies.

Advertisement