Shader Slow Compile Times

Started by
3 comments, last by Aaron Smith 7 years ago

Hi all,

I am doing a shader for SSR and a value can be input to allow the user to increase/decrease the ray count

this is then used to do a while loop and at each iteration the ray steps along i am using [unroll(100)] for the 100 loop steps etc

now when i hard coded some values i got the results

50 loop steps it took 40 seconds to compile
100 loops steps took 7 mins to compile

I am just wondering is this normal and I know once i have finished working on this I want to put this into its own shader and compile a .cso and load in at runtime rather than compile at runtime to stop the long times

But then if my main shader does an include to another shader (where i aim to put the loop code) I cant see how i can compile it as a .cso and include it in or does the shader at compile time go into the include and process that all again?

Im happy to share code if needed and im running on a NVidia 980M laptop

Thanks

Advertisement

It does not really make sense to unroll a code like that. At best you may want to have a dynamic loop over a little unroll[4]. Unrolling a crazy loop with frustrate any compiler and is unlikely to give you a performance improvement anyway.

The tricky part is to get sampling right as branching may give some issues with derivatives. You will either have to rely on samplegrad or in the future use some ballot in sm6 to keep all the threads alive for uv computation if some are actives for sampling.

And no, you cannot pre-compile your function. Every shader that will use it will take the same time to compile. !

50 loop steps it took 40 seconds to compile
100 loops steps took 7 mins to compile
That's a known bug in some versions of the HLSL compiler :(

As above, FWIW, looping is very slow on DX9 level hardware, and somewhat slow on DX10 level hardware, but it's extremely fast on most DX11 level hardware. The only slowness comes from divergence in the loop condition between threads, which is not fixed by unrolling.
e.g. AMD GPU's have a separate scalar unit per shader core which is used to perform branching tasks in parallel with the main workloads, so often branching has no cost at all thanks to dual-issued instructions.

Ahh you see when I took unroll down the a smaller value like 2,4,10 I get an error saying cannot unroll it took to many iterations. But when I put the 100 it is slow but works

I will put my code up later after I have slept and if it is possible to get some advice on the mistakes I'm doing within the shader that could slow it down

Thanks

ok turns out i was using Sample within the dynamic loops which was taking too much time and causing the errors, I swapped them out for SampleGrad and it all works fine now and very fast :D thanks

This topic is closed to new replies.

Advertisement