Jump to content
  • Advertisement
Sign in to follow this  
rpsathe

branching in SM3.0 ??

This topic is 4515 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am using a SM3.0 for writing a PS. I have a for loop that looks something like this. bool flag = false; for (Index=0; Index < max; Index++) { if (bool == false) { : : a = something; b = something else; if (a > b) { flag = true; } } } I get different results when I run it in REF and on real hardware. I do not believe this is a case of precision rounding off error. I am running this on Nvidia 7800. Anyone has any pointers as to how to go about debugging this? It works such that 'flag' is set to true sooner (lesser value of index) when I run it on real hardware than when I run it in REF. So the value of Index for which flag is set to true, is lesser in case of real hardware than the one when I run it on REF. -Rahul

Share this post


Link to post
Share on other sites
Advertisement
Some more info.

I tried getting values of a and b in ref and real hardware.
Values started diverging around the index = 0x20. Upto index = 0x1f, values
were exactly same, after that they started diverging.

Is there a limit on how big your for loop can be? if so, is the limit = 20?

-Rahul

Share this post


Link to post
Share on other sites


"
bool flag = false;
.
.
.
.
.
if (bool == false)
{
}

"


in C and C(pp) i think something like:
if(bool() == true)
{
}

would become true, but not sure for shaders...

are you sure you didn't mean:
"if(flag == false)
{
....
}

"

??

Share this post


Link to post
Share on other sites
Quote:
Original post by rpsathe
Some more info.

I tried getting values of a and b in ref and real hardware.
Values started diverging around the index = 0x20. Upto index = 0x1f, values
were exactly same, after that they started diverging.

Is there a limit on how big your for loop can be? if so, is the limit = 20?

-Rahul


0x20 = 32, not 20. :)
And since you haven't told us what a and b are, or how you adjust them between iterations, it's impossible to really say anything other than that dynamic branching *should* work with any number of iterations. But typically, you won't want to run more than a couple iterations in any case, so who knows if the implementation exploited that, and so only works properly with less than 32 iterations?

I think it's more likely to be your code though. Which means we'd have to *see* your code.

Share this post


Link to post
Share on other sites
Whilst only bad compilers should break the semantics of a program, there is no guarantee that the HLSL compiler will actually implement your shader in branched form. Based on its various rules it may well choose to rearrange stuff and skip branches if it thinks it'll generate more efficient/optimal code.

Compile your shader on the command line using fxc.exe and inspect the code - you should be able to spot branching/looping instructions.

hth
Jack

Share this post


Link to post
Share on other sites
Okay folks, sorry for the delay in responding,

yes...I meant
if (flag == false)...thanks for spotting that.

In the loop, I am doing this --

float4 temp;
temp = texCUBE(sampler,dir);
a = temp.x;
b = distance(foo,bar);

Nothing unusual. Nothing that can have cross loop dependancies or the likes. Is there a hard limit on how big the loop termination count can be?


-Rahul

Share this post


Link to post
Share on other sites
It's still not completely clear what you're doing (I guess on purpose), but it seems that a and b would behave differently (unless b is also dependent on texture read, and you just don't mention that). It's possible, for example, that after 32 samples a different texture slice is read, and some small difference in behaviour makes for a difference in value. It's hard for me to speculate since I haven't seen what differences you're getting.

Share this post


Link to post
Share on other sites
There are hardware limits on how many instructions a shader will execute per vertex/pixel, so you might be running up against those.

Have a look at the MaxPShaderInstructionsExecuted cap. Looking at the graphics capabilities chart in the dx sdk, the 7800 has a 65535 instruction cap on pixel shader instructions that can be executed. I imagine that the REF has no limit, so this could be what's going on. Though that's gotta be a huge shader going on.

Share this post


Link to post
Share on other sites
Yes, that is correct. I am intentionally not revealing the details
as there are IP issues invovled. Sorry about that,

I am using sampler to pass all the vertices of an object as a texture.
b is the actual distance in the world space between two points (foo,bar).
a is read from cube-map. Its a cube map of distances.

Share this post


Link to post
Share on other sites
Quote:
Original post by rpsathe
I am using sampler to pass all the vertices of an object as a texture.

Do you mind if I ask you how are the performances? I recently did something similar on 6600GT and the performance were roughtly 1/4 of the expected.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!