Hi all,
I am doing a shader for SSR and a value can be input to allow the user to increase/decrease the ray count
this is then used to do a while loop and at each iteration the ray steps along i am using [unroll(100)] for the 100 loop steps etc
now when i hard coded some values i got the results
50 loop steps it took 40 seconds to compile
100 loops steps took 7 mins to compile
I am just wondering is this normal and I know once i have finished working on this I want to put this into its own shader and compile a .cso and load in at runtime rather than compile at runtime to stop the long times
But then if my main shader does an include to another shader (where i aim to put the loop code) I cant see how i can compile it as a .cso and include it in or does the shader at compile time go into the include and process that all again?
Im happy to share code if needed and im running on a NVidia 980M laptop
Thanks
Shader Slow Compile Times
It does not really make sense to unroll a code like that. At best you may want to have a dynamic loop over a little unroll[4]. Unrolling a crazy loop with frustrate any compiler and is unlikely to give you a performance improvement anyway.
The tricky part is to get sampling right as branching may give some issues with derivatives. You will either have to rely on samplegrad or in the future use some ballot in sm6 to keep all the threads alive for uv computation if some are actives for sampling.
And no, you cannot pre-compile your function. Every shader that will use it will take the same time to compile. !
50 loop steps it took 40 seconds to compileThat's a known bug in some versions of the HLSL compiler :(
100 loops steps took 7 mins to compile
As above, FWIW, looping is very slow on DX9 level hardware, and somewhat slow on DX10 level hardware, but it's extremely fast on most DX11 level hardware. The only slowness comes from divergence in the loop condition between threads, which is not fixed by unrolling.
e.g. AMD GPU's have a separate scalar unit per shader core which is used to perform branching tasks in parallel with the main workloads, so often branching has no cost at all thanks to dual-issued instructions.
I will put my code up later after I have slept and if it is possible to get some advice on the mistakes I'm doing within the shader that could slow it down
Thanks