Sign in to follow this  

Strange HLSL variable issue

This topic is 4032 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm having a very strange issue with a global variable in a shader I'm writing, and am interested in what the cause might be. My shader implements a filtering method by sampling a number of textures (4 to 10) and summing the weighted results. All of the textures are packed into a single large texture, with a 1 pixel border surrounding each texture to allow correct bilinear filtering. One of the things the shader does is convert a set of texture coordinates so that the correct texture (within the large texture) is indexed. This is accomplished using a function that relies on a global variable, "g_iTexWidth", that contains the width of the smaller packed textures. This is working great for traditional images, but I ran into a very strange issue when using the same approach for Perlin Noise. I found that noise octaves higher than one produced vertical bands (as if the v texture coordinate was fixed). If I manually set the "g_iTexWidth" variable from within the shader (rather than using setInt() externally), then the issue disappeared! After much frustration, I finally found that I could change a single (seemingly trivial) line of code to fix the issue. If I change: float4 indexedCoords = packWidth*indexedCoords + texCoords*g_iTexWidth + 1; to: float4 indexedCoords = packWidth*indexedCoords + 1; indexedCoords += texCoords*g_iTexWidth; then it works fine. What could possibly cause this behavior? I am I hitting some unwritten limit in HLSL or the hardware? If so, then how does defining that variable from within the shader have any effect? Basic pseudocode for my pixel shader may help (the vertex shader is trivial): 1) Calculate partial derivatives with ddx and ddy 2) For (var1=0; var1 < 3; ++var1) //static for loop 3) A few basic arithmetic ops 4) For (var2=0; var2 < octaves; ++var) //static interior for loop 5) Calculate texture coords (based on var1 and several global vars set from the application, including g_iTexWidth) 6) Sample texture at calculated coords. So there's 6 total texture reads for 2 octaves of noise (the minimum at which the issue starts to occur). It seems that I've found a 'fix' for now, but I have no idea what to avoid or whether I could have other strange issues shortly. This is running on an ATI X1600, and I had similar (but not quite identical) with a much beefier GeForce 6800 Ultra. Any help would be greatly appreciated, as I'm at a loss. Thanks!

Share this post


Link to post
Share on other sites
Look at the output which the fxc compiler generates, before and after changing that line of code, and see if there's a difference.

If your not using the fxc directly, but creating the shader in your app through an Effect, use D3DXDisassembleEffect or Effect.Disassemble to look at the fx compiler's asm version of your shader.

Share this post


Link to post
Share on other sites
Thanks for the advice.

I've been working on this issue for the past couple days, and have eliminated a number of possibilites. I also came up with a much simpler demonstration of the problem.

Here's a test PS3.0 pixel shader that demonstrates the issue:

PS_OUTPUT SubbandsNoiseSamplePS(VS_OUTPUT In)
{
PS_OUTPUT Output = (PS_OUTPUT)0;
//g_iSubsPerEdge = 3;
float2 textureUV = In.TextureUV;
float noiseVal = 0;
float4 texCoords = 0.0;

for (int subband=0; subband<6; ++subband) {
texCoords.xy = indexSubband(In.TextureUV, subband);
noiseVal += (2.0*tex2Dlod(SubSampler, texCoords)-1.0);
}
Output.RGBColor = noiseVal*.5+.5;
return Output;
}

And here's the indexSubband function being called:

float2 indexSubband (float2 texCoords, int subband) {
int packWidth = g_iTexWidth + 2; //width of each packed texture (including border)
float2 indexedCoords = float2(subband%g_iSubsPerEdge, (g_iSubsPerEdge-1)-subband/g_iSubsPerEdge); //set indexes for this texture
indexedCoords = packWidth*indexedCoords + 1; //index correct pixel from pixel at texture start ( +1 pixel for border)
indexedCoords += texCoords*g_iTexWidth;
indexedCoords *= g_fInvSubbandsWidth; //convert to [0,1] range

return indexedCoords;
}

If I uncomment the "g_iSubsPerEdge = 3" line, everything works fine. Otherwise, the image looks like one or more calls to indexSubband() are returning incorrect texture coordinates. For this example, that variable is being set with the SetInt function immediately prior to calling BeginPass().
I'm compiling the asm within the app from an effect file using D3DXCreateEffectFromFile(), passing the D3DXSHADER_SKIPOPTIMIZATION and D3DXSHADER_DEBUG (so compiler optimization isn't an issue)

If I render with the D3DDEVTYPE_REF device and set the D3DXSHADER_FORCE_PS_SOFTWARE_NOOPT flag, then it also works fine. The asm code generated with the software PS is EXACTLY the same as with the unoptimized hardware PS, except for the ps_3_sw instead of ps_3_0 instructions.

I should further mention that the asm generated with the uncommented line is much different, as the compiler precomputes and hardwires a number of values to avoid ever using a uniform register for the g_iSubsPerEdge variable.

The hardware used for this shader is an ATI X1400, which has identical caps to the ref device except for a few of the supported texture formats (which isn't the issue here). I'd almost chalk it up to a hardware issue, except that I've had similar strange issues with a GeForce6800 card on a different machine.

Any suggestions you guys might have would be greatly appreciated. I'm pretty much at wit's end here, and can't think of anything else to try. The last thing I should mention is that both computers that had the issue are using the June 2006 release of the directX SDK.

Thanks!

Share this post


Link to post
Share on other sites
This sounds like the driver optimizer is optimizing code incorrectly, and somehow not updating internal values when you change the global register.

A couple ideas:

1) Try passing the value as a parameter to the function, instead of using it as a global register. This seems like the most likely place for the issue to arrise.

2) Try seperating the code into multiple lines. This might cause the compiler to compile even a bit differently and perhaps avoid the bug somehow. Try placing each calculation into it's own line. Even the slightest change in ASM code might fix the issue.

3) The newer SDKs include a beta HLSL compiler. While it isn't recommended for actual use, it'd be worth trying to compile with it just to see if the results are different.

4) Make sure you're using up-to-date drivers, and even try to avoid beta drivers if you're using any.

Hope this helps.

Share this post


Link to post
Share on other sites

This topic is 4032 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this