# Any known issues with loop limits in HLSL SM5?

This topic is 2156 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi. First post so I hope I'm asking this in the right place.

I was wondering if there are any known issues with loop limit conditions in HLSL shader model 5?

The reason I ask is because I had an issue today with precisely this, and it appears to be some kind of compiler issue. Unfortunately this is part of a piece of university coursework and so I can't really post all the code up here but the basic idea was that I was passing in the number of lights I was using (with a maximum of 5) using a cbuffer. When I wrote the for loop to iterate over the lights using the light count in the cbuffer, I got a static/noise effect on the screen. However when the limit was set using a literal with the same value, the issue disappeared.

That is to say, this caused an error:

for (uint c = 0; c < lightCount; c++)
{
// lighting calculations here
}

whilst this didn't

for (uint c = 0; c < 1; c++)
{
// lighting calculations here
}

The final solution I settled on was something like this, which works fine

for (uint c = 0; c < MAX_LIGHTS; c++)
{
if (c >= lightCount)
break;
// lighting calculations here
}

Incidentally I had the exact same problem when restructured to use while loops like so

uint c = 0;
while (c < lightCount)
{
// lighting calculations here
c++;
}

So yes basically, I wondered, is this a known issue? Or is it not even allowed by the language? All I know is that it compilers without an error. I spoke to some other people about this issue and they said that such problems were known, but only in much lower shader models (like 2 or something). However I'm definitely compiling with shader model 5 ("ps_5_0" is the parameter is the compile shader from file function).

Anyhow if anyone has any thoughts on what might be happening here, it would be greatly appreciated. Thanks!

##### Share on other sites

Are you sure lightCount is valid? The fact that any code that uses lightCount is broken seems to indicate the problem is with lightCount rather than HLSL or the compiler.

EDIT: except for the third code example. Reading comprehension... I lack it.

Edited by Samith

##### Share on other sites
I haven't experimented in SM5, but I did some experiments with this in SM3 ---
When I wrote a loop like your first one (for i less than limit, ...), the generated assembly code actually looked like your second loop (for i less than 255, if i greater/equal to limit, break, else ...)!

Try writing a minimal example (smallest shader with this loop), compiling it with FXC.exe and getting it to output the resulting assembly as text to see what's really going on with both loops.

Also, try adding the '[loop]' attribute right before 'for' and seeing how it affects the code-gen.

##### Share on other sites

Hi guys. Thanks for your thoughts.

@Samith

Yes as you've already spotted, the problem is definitely not with the value in lightCount. In fact I believed that to be the problem too but I've done enough tests to convince myself that this is not the case. For example I did things like

if (lightCount == 5)
lightColour = float4(0.0, 1.0, 0.0, 1.0);

when I knew the value I was passing in was five, and sure enough the whole screen turned green.

@Hodgeman

Following your suggestion, I've been going through the assembly the shader compiler produces and I must say I can't see any problem at all. For example, when I used the first structure (the one that directly compares lightCount in the for loop), the code produced is like this:

mov r2.y, l(0) // r2.y is c
loop
ilt r2.z, r2.y, cb0[23].x // cb0[23].x is lightCount
breakc_z r2.z
...

When I used the second structure (with the literal MAX_LIGHTS and the if break test), the code produced is like this:

mov r2.y, cb0[23].x  // cb0[23].x is lightCount
mov r2.z, l(0)  // r2.z is c
loop
ilt r2.w, r2.z, l(5)
breakc_z r2.w
ige r2.w, r2.z, r2.y
if_nz r2.w
break
endif
...

Both increment the loop variable correctly. So far so good as far as I can see. However I wondered if the problem was using the cbuffer directly in the comparison. Therefore I did some more tests with moving the lightCount into a local variable (read register). The results of this were very puzzling.

The first code, using a local variable to read in lightCount, still produced the error. It looked like so

int localLightCount = lightCount;
for (int c = 0; c < localLightCount; c++)
{
// lighting calculations here
}

and in assembly like so

mov r2.x, cb0[23].x // r2.x is localLightCount
mov r2.z, l(0) // r2.z is cloop
ilt r2.w, r2.z, r2.x
breakc_z r2.w
...

OK so this is basically the same as the first code and so you wouldn't expect it to make any difference. However when I assign a literal to localLightCount, it works! The code looks like so

int localLightCount = 5;
for (int c = 0; c < localLightCount; c++)
{
// lighting calculations here
}

and the assembly like so

mov r2.x, l(5) // r2.x is localLightCount
mov r2.z, l(0) // r2.z is cloop
ilt r2.w, r2.z, r2.x
breakc_z r2.w
...

Just as you would expect, the only thing that changes is that localLightCount is assigned to a literal. All the code for the loop limiter remains exactly the same. To condense it even more, this line makes it work

mov r2.x, l(5)

while this line makes it fail

mov r2.x, cb0[23].x

However, we can't say that there is an error reading in the value to the register, because I know the values is correct. This is because I have run if localLightCount == 5 on it and sure enough, localLightCount is 5. But equally, it can't be an issue with the comparison because by the time the comparisons happen, it is the same. That is to say, the ilt command is being run on a pair of registers in both of the last two examples, and yet one breaks and the other does not!

So does anyone have any idea what may be going on here? One thing I would say is that I've only been able to test this on a single graphics card, and so I can't say whether or not the error is hardware specific. That said, unless I've missed something in the assembly, I don't see how it can be an issue with the compiler. As far as I can tell, the compiler is doing pretty much what it should be.

I'm not sure how much time I'm going to be able to spend on this to be honest, but I'd be very interested to find out what is going on so any other thoughts or observations would be much appreciated. Thanks!

##### Share on other sites
I don’t know what your issue is, but I can confirm it should not be happening, and is likely a driver bug.

I have the following code in my “ps_5_0” shader:

cbuffer cb3:register(b3){
int g_iTotalDirLights:packoffset(c0.x);
};
	for ( int I = 0; I < g_iTotalDirLights; I++ ) {
LSE_COLOR_PAIR cpThis = GetDirLightColorsAshikhminShirley( vNormalizedNormal, vViewPosToEye, I );
cpLightColors.cDiffuse += cpThis.cDiffuse;
cpLightColors.cSpecular += cpThis.cSpecular;
}
Make sure you use “int” and not “uint” for anything.
My g_iTotalDirLights value can go up to at least 7 (I can test higher but I have to add lights manually).

If you end up unable to find any discernible differences then it is probably a driver bug.

L. Spiro Edited by L. Spiro

##### Share on other sites

Hi L. Spiro, thanks for the feedback. I am using ints now as I saw something somewhere else about not using uints. However could you explain why it is that I should use ints instead? After all, the value is never going to become negative..

From what you've said, I'll assume for now that it's a driver bug and be happy with my bodge fix. However if I do get the chance to try it on some other graphics cards, I'll be sure to post the results here. Thanks again everyone for all your thoughts  .

##### Share on other sites

The GPU works with floats no matter what type you use.  int’s are just floats with no fractional part, and since they can be negative they are much more compliant with floats than uint types.

Compilers also have special-case compiling of loops specifically aimed at optimizing them.

For example, you can’t do “for ( int i = lightCount; --i >= 0; )”; the compiler will tell you to put the loop in “normal” format (this may depend on the shader model), and extra special-case code will have the compiler trying to figure out if it is optimal to unroll and how far should it be unrolled.

Adding uint to all of that just increases the risk of some obscure overlooked bug in the compiler.

You should also mention which graphics card you are using.  And when I get home I will try with a uint loop on my end.

L. Spiro

##### Share on other sites

Ah thanks for the explanation. That makes more sense now.

The graphics card I'm using is an NVIDIA GeForce GTX 650 and the driver version is 320.57.

Also I forgot to mention but I did also try Hodgman's suggestion of adding the [loop] attribute before the for loop but the assembly produced did not seem to change as a result and it didn't affect the glitch.

##### Share on other sites

The GPU works with floats no matter what type you use.  int’s are just floats with no fractional part, and since they can be negative they are much more compliant with floats than uint types.

Are you certain about that? As of GL_EXT_gpu_shader4 (or OpenGL 3.0), full integer support is required and int is defined as signed 32bit two's complement integer. This corresponds to shader model 4 hardware, the OP is using shader model 5 hardware, so it must certainly support integers natively, too.

Compute capability 1.2 even mandates native support for 64bit integers (compute capability 1.0 mandates native 32bit, and emulated 64bit integers).

How do you fit a 32/64bit integer into a register of the same size as a float value without fractional part when there are only 23/52 bits you could use?

##### Share on other sites

Not anymore.
Perhaps things have changed more recently.

I tried uint for my counter and had no problems.
GeForce GTX 680. Should be a driver bug.

L. Spiro

• ### Game Developer Survey

We are looking for qualified game developers to participate in a 10-minute online survey. Qualified participants will be offered a \$15 incentive for your time and insights. Click here to start!

• 16
• 9
• 15
• 9
• 11