# Loop Compilation - Bad Performance

## Recommended Posts

Hey all!

I've got issues with the performance (as in, time taken) of the compilation of a relatively simply loop (DX9):

for (int i=0; i < pointsnum; i++)
{
Color.rgb = samplepoint(OrigColor, Color, uv, ov, points[i].xy, points[i].z);
Color.rgb = samplepoint(OrigColor, Color, uv, ov, float2(points[i].y,points[i].x), points[i].z);
}


I will save the details here. Basically the compiler takes constantly around 2.7 seconds with 18 iterations, but 7.8 seconds with 36. I would have suspected compilation time to increase linearly with the number of iterations, so should be about twice the time when doubling iterations

Does the complexity of the called samplepoint function (which has calls to subsequent custom functions as well) affect compilation performance this much, or why does the compiler need that much more time for just twice the iterations count?

Edit:

If I disable one of the two lines in the loop (doesn't matter which one), compilation time is the same as if using 18 instead of 36 iterations for both lines.

If I make two loops each with one of the two lines instead just one loop, compilation time is the same as with only one loop.

So it's definitively a compilation issue, not one of my code in particular!

Any hints would be appreciated.

Edited by Meltac

##### Share on other sites
the compiler might find more optimization opportunities with more code in the generated binary. e.g. for twice the amount of code * twice the amount of possible duplicated code, even if the query to find redundant code would take just O(log n), you'd still need to execute it now for O(log 2*n).

2.7s = n*log( n)
n = ~2.7
2*n*log(2*n) -> 2*2.7*log(2*2.7) -> 9.1s
your 7.8s is not that far of I'd say

anyway, if the compilation time 'cause you trouble, try to force looping, at least for development it might save you some time:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb509602(v=vs.85).aspx

##### Share on other sites

Mmmkay,  took me a while to realize this was about compilation time, not execution-time of the shader. Are you compiling your shaders every frame? Or why do you expect the compilation time to be lower?

Compilation is supposed to be run "offline", and it is definitely not meant to be a real-time process. It's the compiled code's performance that ultimately matters, not the compiler's performance. That means that sometimes the compiler will take extra (slower) steps to make sure your compiled code is faster.

In your case, the compiler will decide to "unwrap" the loop if there are sufficient instruction slots available (as is the case when you use only 18 iterations or when you remove one of the lines inside the loop). If unwrapping is not possible, it will still produce a loop. It does this because "wrapped" loops are slow on the GPU, as they take one extra comparison and a "goto", compared to their unwrapped versions, which produce instructions only for whats inside the loop.

So the time it takes the compiler to decide whether unwrapping is necessary or not is what's causing your compilation time differences.

So be warned: if you force looping as Krypt0n said, you might get better compile times, but worse execution times.

Edited by tonemgub

##### Share on other sites

Thanks to you both.

Are you compiling your shaders every frame? Or why do you expect the compilation time to be lower?

Compilation is supposed to be run "offline", and it is definitely not meant to be a real-time process. It's the compiled code's performance that ultimately matters, not the compiler's performance. That means that sometimes the compiler will take extra (slower) steps to make sure your compiled code is faster.

I'm developping a mod for a game that compiles its shaders upon startup, and again on every reload of a savegame. That's why compilation times actually *do* matter in my case.

Whether the longer compilation makes the shaders run faster I cannot say for sure, but the FPS drop seems to be about the same as the compilation time increasing with the "longer" loop, that's also why I am a little concerned that compilation does not only take long because of any optimization stuff but that there might be something going weird when my loop is being compiled.

In your case, the compiler will decide to "unwrap" the loop if there are sufficient instruction slots available

By "unwrap" you mean unroll in the HLSL terminology, right?

So be warned: if you force looping as Krypt0n said, you might get better compile times, but worse execution times.

Hmm, I've heard the opposite... sure about that? Some time ago I've done some testing with some other shader of mine using the [loop] attribute and I didn't notice any difference in runtime performance with a huge loop with 160 iterations...

Edited by Meltac

##### Share on other sites
it may cause a slow down but it may also run the same speed. that's very shader dependent and you need to profile it on a case by case basis. But if compile time is the big issue and the slowdown is not incredibly high, there is not much choice you have beside doing whatever is needed to speed compilation up -> [loop]

##### Share on other sites

I checked: The loop attribute does not change much - neither regarding compilation time nor runtime performance. The differences with and without are only marginal.

I don't understand that log calcuation thing. If compilation complexity is like O(log n), isn't n the number of iterations in my case? So

log(18)*x = 2.7s

--> x ~= 2.2

log(36)*2.2 = 3.42s

Anyways, that's just speculation, right? Who can tell me what the compiler actually does, or tries to do?

What I need is a way to tell the compiler how to treat that loop. The [loop] attribute (or [unroll]) should do this, theoretically, but apparently that's not enough in this case. Seems as if something else would be needed here. Maybe some change in code structure instead?

I wouldn't mind if it took the compiler a little longer than twice the time to process a loop of twice the iterations if runtime performance would be better, but the way it is - way too long compilation with no noticeable FPS benefit - is not acceptable for me.

Edited by Meltac

##### Share on other sites

Anyways, that's just speculation, right? Who can tell me what the compiler actually does, or tries to do?

It's not speculation. Without having the compiler's source code, you can't tell for sure that what we said is true, but I can assure you that there's not any other explanation than what we've already provided.

And you don't have to trust us - you can check for yourself: the effects compiler (or the D3DCompile functions) can also output the assembly code of the compiled shader for you. In the assembly code, Instructions that are inside a loop are indented, so you can clearly tell if the loop is being unrolled or not. Then you can draw your own conclusions whether it really is the unrolling that's causing the compiler slowdown or not.

I wouldn't mind if it took the compiler a little longer than twice the time to process a loop of twice the iterations if runtime performance would be better, but the way it is - way too long compilation with no noticeable FPS benefit - is not acceptable for me.

If you trust your FPS calculations, then go ahead and disable the loop optimizations. And don't worry - even if there is performance loss with the forced-loop shaders, it will not be that noticeable.

I'm developping a mod for a game that compiles its shaders upon startup, and again on every reload of a savegame. That's why compilation times actually *do* matter in my case.

Are you sure you can't just use pre-compiled shaders in your mod? Anyway, there's definitely no doubt that shader compilation is not meant to be a real-time process. You should be blaming the game's developers if they don't allow pre-compiled shaders, not the compiler for being slow. Microsoft has repeatedly stated that developers should always use pre-compiled shaders.

Edited by tonemgub

##### Share on other sites

Thank you for the explanations.

Are you sure you can't just use pre-compiled shaders in your mod? Anyway, there's definitely no doubt that shader compilation is not meant to be a real-time process. You should be blaming the game's developers if they don't allow pre-compiled shaders, not the compiler for being slow. Microsoft has repeatedly stated that developers should always use pre-compiled shaders.

Yes I am. And yes, I do blame the game's developers

Found another thing meanwhile. Using [unroll] instead of [loop] causes my shader to crash upon compilation. I don't get any specific compiler error in that case but for what I've read on the topic there are cases where unrolling a loop at compile time is not possible, so given the fact that using [unroll] doesn't work and [loop] does not change a thing I'd assume my shader isn't unrolled by the compiler, right? I'll check the assembly code nonetheless.

Edited by Meltac

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
627701
• Total Posts
2978705

• 21
• 14
• 12
• 10
• 12