• entries
    422
  • comments
    1540
  • views
    488852

Kick me. Please.

Sign in to follow this  

225 views

Ok, by way of a more detailed follow-up on yesterdays semi-rant stamp-my-feet post.

As you'll see at the end of this post I deserve a public kicking for being such a muppet. I could've just denied this whole story and saved any credibility I had, but I figured you might like a chuckle at my expense [smile]

My current problem can be expressed thusly:

Oren-Nayar.fx - Pixel Shader 4.0
float C;
if( UseLookUpTexture )
{
// Map the -1.0..+1.0 dot products to
// a 0.0..1.0 range suitable for a
// texture look-up.
float tc = float2
(
(VdotN + 1.0f) / 2.0f,
(LdotN + 1.0f) / 2.0f
);
C = texSinTanLookup.Sample( DefaultSampler, tc ).r;
}
else
{
float alpha = max( acos( VdotN ), acos( LdotN ) );
float beta = min( acos( VdotN ), acos( LdotN ) );

C = sin(alpha) * clamp( tan(beta), -1.0f, 1.0f );
}
Should be pretty obvious - based on a compile-time flag it'll either roll a look-up texture branch or an ALU-heavy branch. The simple idea being that 2xacos(), sin() and tan() will be more expensive than a simple R32F texture fetch.

For the complex arithmetic there are two inputs - VdotN (View vector dot'd with the normal vector) and LdotN (Light vector dot'd with the normal vector). Being normalized these make perfect candidates for a texture lookup and you can see the trivial code for remapping [-1..+1] to [0..1] in the fragment above.



LEFT: Using a 512x512 look-up, RIGHT: Using arithmetic

Difference of the two source images


Now at a casual glance they look pretty much the same. It's the highlights that are the different part and in the context of lighting models this is a big thing. In a lot of cases its how a lighting model handles these highlights and grazing reflections that really defines the model as a whole.

Breaking down the shader code to output just the fragment I initially posted gives the following images:



LEFT: Using a 512x512 look-up, RIGHT: Using arithmetic

Difference of the two source images


Should be a little more obvious now - this is quite literally the difference between the look-up and 'pure' approaches. They should be identical.

To follow up on Ysaneya's comments on my last entry:
Quote:
set the filtering to closest/none (no filtering) and check that you now get the correct (but blocky) results..
This is a good test for debugging individual values as the CPU is inserting discrete values into an array, but as the arithmetic is on continuous functions you'd hope that default linear filtering wouldn't really affect things...

To save on bandwidth you can take my word on it that point filtering doesn't really reveal anything useful visually. However we'll come back to that one later I suspect [wink]

Quote:
- change the resolution of your lookup table to something incredibly big (4096) or incredibly small (16), and check how the incorrect results you get get affected. If this is indeed a texture adressing problem, going to larger resolutions should make the problem become less noticeable, as 0.5/16 is much bigger than 0.5/4096.
This is a good suggestion as it is possible that the errors are due to a low-resolution look-up texture. If this idea were the real cause then you'd expect the shape/tone of the highlights (which is what the previous set of images showed as being wrong) would converge as the look-up texture got increasingly accurate:


(click to enlarge)


There is convergence in the above images, but only converging on the same error.


Okay, so next stop is to validate the inputs. At this point I'm reasonably confident that the actual arithmetic performed by both CPU and GPU is equivalent. I wasted a bit of time playing around with GP-GPU style debugging in this journal entry.

As was previously mentioned the key variables are LdotN and VdotN. So if I generate an identity texture and modify the shader as so:
float C;
if( UseLookUpTexture )
{
// Map the -1.0..+1.0 dot products to
// a 0.0..1.0 range suitable for a
// texture look-up.
float tc = float2
(
(VdotN + 1.0f) / 2.0f,
0.0f
);
C = texSinTanLookup.Sample( DefaultSampler, tc ).r;
}
else
{
C = (VdotN + 1.0f) / 2.0f;
}
I would be hoping that what goes in is exactly what comes out and that both branches are identical:




Okay, so WTF is going on here?? No, really, this isn't what I was expecting.

I played around with a few of the variables toggling between outputting VdotN and LdotN. It basically boiled down to this:

//Works:
float tc = float2
(
(VdotN + 1.0f) / 2.0f,
0.0f
);

float tc = float2
(
(LdotN + 1.0f) / 2.0f,
0.0f
);

// Doesn't work:
float tc = float2
(
0.0f,
(VdotN + 1.0f) / 2.0f
);

float tc = float2
(
0.0f,
(LdotN + 1.0f) / 2.0f
);


I first mentioned this particular bug about 11 days ago and I've only just spotted what the problem is. Whilst writing this journal I've copy-n-pasted the fragment with the mistake in twice and still not spotted it.

Can you spot it?
Sign in to follow this  


6 Comments


Recommended Comments

float tc = float2 ???
Surely the compiler would have caught that one? You can't take the blame for it if your compiler misses it too!

Share this comment


Link to comment
Under .NET or C++ that'd be classed as a warning, but the HLSL compiler doesn't have warnings - only errors.

Going from a float2 literal down to a float is legal, and then padding it back up to a float2 for .Sample() is also legal. There isn't technically an error in that code.

Makes me wonder if theres any way of creating some sort of static analysis tool for this...

Share this comment


Link to comment
Interesting. So it means that your code used VdotN twice (due to the replication going back from a float to the implicit float2 used in the texture sampler) instead of VdotN + LdotN.

In OpenGL, I think the GLSL compiler would have treated that as an error.

Always good to see that everything has a logical explanation, and that no, you're not going crazy :)

Share this comment


Link to comment
Quote:
Original post by Ysaneya
In OpenGL, I think the GLSL compiler would have treated that as an error.


Correct, no implicate down casts are performed in GLSL

Share this comment


Link to comment
Yes, it is interesting how most other languages require an explicit cast when down-sizing a variable...

Quote:
So it means that your code used VdotN twice
I suspect not. Going from float a = float2(u,v) will have 'a' equal to 'u', but going from 'a' to float2 implicitly will be float2(a,0) rather than float2(a,a)...

This would explain the way that I had to force a CLAMP addressing mode to resolve some artifacts - if it was always sampling from V=0 then bilinear filtering will have also pulled in some texels from V=1 as well...

Cheers,
Jack

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now