branching in SM3.0 ??

Started by
10 comments, last by rpsathe 17 years, 9 months ago
I am using a SM3.0 for writing a PS. I have a for loop that looks something like this. bool flag = false; for (Index=0; Index < max; Index++) { if (bool == false) { : : a = something; b = something else; if (a > b) { flag = true; } } } I get different results when I run it in REF and on real hardware. I do not believe this is a case of precision rounding off error. I am running this on Nvidia 7800. Anyone has any pointers as to how to go about debugging this? It works such that 'flag' is set to true sooner (lesser value of index) when I run it on real hardware than when I run it in REF. So the value of Index for which flag is set to true, is lesser in case of real hardware than the one when I run it on REF. -Rahul
Advertisement
Some more info.

I tried getting values of a and b in ref and real hardware.
Values started diverging around the index = 0x20. Upto index = 0x1f, values
were exactly same, after that they started diverging.

Is there a limit on how big your for loop can be? if so, is the limit = 20?

-Rahul


"
bool flag = false;
.
.
.
.
.
if (bool == false)
{
}

"


in C and C(pp) i think something like:
if(bool() == true)
{
}

would become true, but not sure for shaders...

are you sure you didn't mean:
"if(flag == false)
{
....
}

"

??

Quote:Original post by rpsathe
Some more info.

I tried getting values of a and b in ref and real hardware.
Values started diverging around the index = 0x20. Upto index = 0x1f, values
were exactly same, after that they started diverging.

Is there a limit on how big your for loop can be? if so, is the limit = 20?

-Rahul


0x20 = 32, not 20. :)
And since you haven't told us what a and b are, or how you adjust them between iterations, it's impossible to really say anything other than that dynamic branching *should* work with any number of iterations. But typically, you won't want to run more than a couple iterations in any case, so who knows if the implementation exploited that, and so only works properly with less than 32 iterations?

I think it's more likely to be your code though. Which means we'd have to *see* your code.
Whilst only bad compilers should break the semantics of a program, there is no guarantee that the HLSL compiler will actually implement your shader in branched form. Based on its various rules it may well choose to rearrange stuff and skip branches if it thinks it'll generate more efficient/optimal code.

Compile your shader on the command line using fxc.exe and inspect the code - you should be able to spot branching/looping instructions.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Okay folks, sorry for the delay in responding,

yes...I meant
if (flag == false)...thanks for spotting that.

In the loop, I am doing this --

float4 temp;
temp = texCUBE(sampler,dir);
a = temp.x;
b = distance(foo,bar);

Nothing unusual. Nothing that can have cross loop dependancies or the likes. Is there a hard limit on how big the loop termination count can be?


-Rahul
It's still not completely clear what you're doing (I guess on purpose), but it seems that a and b would behave differently (unless b is also dependent on texture read, and you just don't mention that). It's possible, for example, that after 32 samples a different texture slice is read, and some small difference in behaviour makes for a difference in value. It's hard for me to speculate since I haven't seen what differences you're getting.
There are hardware limits on how many instructions a shader will execute per vertex/pixel, so you might be running up against those.

Have a look at the MaxPShaderInstructionsExecuted cap. Looking at the graphics capabilities chart in the dx sdk, the 7800 has a 65535 instruction cap on pixel shader instructions that can be executed. I imagine that the REF has no limit, so this could be what's going on. Though that's gotta be a huge shader going on.
Yes, that is correct. I am intentionally not revealing the details
as there are IP issues invovled. Sorry about that,

I am using sampler to pass all the vertices of an object as a texture.
b is the actual distance in the world space between two points (foo,bar).
a is read from cube-map. Its a cube map of distances.

Quote:Original post by rpsathe
I am using sampler to pass all the vertices of an object as a texture.

Do you mind if I ask you how are the performances? I recently did something similar on 6600GT and the performance were roughtly 1/4 of the expected.

Previously "Krohm"

This topic is closed to new replies.

Advertisement