Can someone explain [loop] and [unroll] to me?

Started by
3 comments, last by MJP 10 years, 5 months ago

So I am having a hard time understanding the difference between [loop] and [unroll]. From what I understand, [unroll] causes the compiler to take the contents inside a loop and repeatedly place it's machine code in the output for as many times as the loop executes. I am unsure of how to use [loop] and what the tradeoff's are between the two? I have looked at the MSDN explanation but it seems to be brief and doesn't really give me enough information to understand it.

J.W.
Advertisement

If I'm not wrong, then:

When you unroll, as you said, you expand this loop and if you look into the compiled code, you'll easily see that, but bear in mind that this is only possible when the loop has boundaries, though you can specify the max number of iterations in the unroll attribute, e.g.


[unroll(5)] // # Of iterations goes to max 5!

Now the perhaps more interesting is the Loop attribute. While enabling it, ( [loop] ), you enable the 'flow' control inside this loop. The shader assembly now has the ability to jump/branch to a new statement of execution, which is noticeable in the compiled code. Now this flow control can be set statically in compilation time, or it can be predicated controlled, or even dynamically controlled (Allows you to set the boundaries at run time, and that boundary can change through the life ).

But several challenges appear while setting the loop attribute, as:


Gradient-based operations must be moved out of flow control to prevent divergence. Performance may improve by using a non-gradient operation

Sometimes (I've always had them inside loops), the HLSL compiler tells you that you are doing something illegal, which prevents compilation. So apparently you may NOT use the Gradient Operations. These operations are mainly used while sampling (And in some other cases, but I'm not sure what), as for when (Example) you have a full screen quad, you provide some vertices to it, and some texture coordinates as well, but HLSL needs to interpolate between them (Or what operation they use for that...), and they will also need to calculate the right mipmap level for 'that' pixel. But don't fear, there are optional functions than Sample, like SampleGrad, SampleLevel, etc...

If I'm not wrong the Loop attribute WILL be slower in most cases and the jumping between statements costs just a bit, and you will not be able to (In most cases) use the Gradient Operations (Like when sampling). The unroll attribute is faster, you can use the gradient operations, but you have a fixed number of iterations (Though there are some tricks...)

And Guys, this topic isn't the clearest for me, so if you see anything wrong, please tell me as I'd like to prevent sending the wrong information to jdub. smile.png

Hope it helps.

-MIGI0027

FastCall22: "I want to make the distinction that my laptop is a whore-box that connects to different network"

Blog about... stuff (GDNet, WordPress): www.gamedev.net/blog/1882-the-cuboid-zone/, cuboidzone.wordpress.com/

If you know that your loop often quits early (well before the max iterations), you can hint this to the compiler with the [loop] attribute. The early exits you potentially gain this way can amortize the cost of dynamic looping.

Unrolling, on the other hand, has a static cost for the given number of iterations, but it tends to perform more efficiently because the system can determine your memory access pattern in advance. So if you know that most of your loop iterations are going to run to the finish, you likely want to unroll. Also, if you logically have a fixed number of iterations in your loops, then unrolling is almost always more efficient (unless the number of iterations is very high, in which case the length of the unrolled program itself would become the bottleneck.)

The reason you have to hint this yourself is that the compiler cannot guess anything about how the shader actually gets used. If the loop counter is based on some dynamic data, there is no way for the compiler to determine which one out of unrolling or actual looping is more effective - because that data is not available during the compile time.

Finally, some shader profiles do not even support actual looping; the hint can still be specified, but it is ignored if it is incompatible with the profile. This way, you don't have to write multiple versions of the code for multiple shader profiles.

Niko Suni

The reason you can't use some sampling function inside a dynamic-counter loop is that during sampling, the system needs to calculate the derivatives of the interpolants for the current pixel group (2x2) in order to determine the correct mip level for the current pixel. The derivatives are impossible to calculate if the pixels of the current group each potentially execute a different logic (as is the nature of dynamic looping).

You can circumvent this by calculating the needed derivatives outside the loop (so that the basis of the derivatives effectively become non-varying across the pixel group), and then use those derivatives within the loop by calling purpose-built versions of the sampling functions that take them as parameters. The compiler will also try to do this automatically, but if the surrounding logic is complex it may fail to do so.

Niko Suni

Your understanding is generally correct. Usually you want to unroll wherever possible, which requires knowing in advance exactly many iterations are required for the loop. So for example if you have a for loop that always loops 4 times, it's a good candidate for unrolling. Unrolling will cause the compiler to simulate the code inside the loop, and basically duplicate the code for however many times the loop executes. The alternative would be to use a dynamic loop, where the compiler inserts assembly instructions for incrementing a loop counter, checking whether the counter reaches the desired count, and conditionally jumping back to the beginning of the loop. This is less efficient than unrolling the loop due to the extra instructions, and the added constraints of having dynamic flow control. It also prevents the compiler from using certain optimizations that can be used when loops are unrolled. For instance, take this simple loop:


for(uint i = 0; i < 4; i++)
{
    if(i == 2)
        result += TextureA.Sample(TexSampler, uv + float2(i / 512.0f, 0.0f));
    else
        result += TextureB.Sample(TexSampler, uv+ float2(i / 512.0f, 0.0f));
}

If this loop is unrolled, the compiler will just emit assembly where TextureB is sampled twice, then TextureA is sampled once, and then TextureB is sampled again. This is because it can simulate the loop, and determine whether the branch will be taken for any iteration of the loop. If this were a dynamic loop, the compiler would have to emit instructions to evaluate the branch condition every iteration and make the choice as to which texture to sample. Also you wouldn't be able to use "Sample" inside of a dynamic loop, for the reasons Nik02 has mentioned. But of course the upside of using a dynamic loop is that you don't need to know number the iterations at compile time, which means you can loop based on a value from a constant buffer or a texture if you want.

Normally the compiler will look at your code and decide for you whether or not it should unroll a loop based on heuristics. Usually it will unroll if it can easily determine that the number of iterations is static, and use dynamic loops otherwise. However if you want you can give the compiler a strong hint as to which behavior it should use, and that's where the loop and unroll attributes come in. They essentially let you force the compiler to unroll or use dynamic looping. Note that any loop can be represented with dynamic loop constructs, but not every loop can be unrolled. If you have a loop where the number of iterations isn't known at compile time and you try to unroll it, the compiler won't be able to do it unless you specify the maximum number of iterations. When you do that, the compiler will unroll the loop and basically wrap each iteration in a branch that would cause the iteration to be ignored if the loop count has been reached

This topic is closed to new replies.

Advertisement