Jump to content
  • Advertisement
Grim42

DX11 HLSL unexpected dot product results

This topic is 492 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

I have written a deferred renderer a few years ago and now I picked up the project again to fix some outstanding bugs and extend some features. The project is written in C++, DirectX 11 and HLSL.

While fixing the bugs I stombled across a strange behavior in one of my shader files which took me some time to track down. First I thought it had to do with my depth reconstruction algorithm in the point light shader, but after implementing alternate algorithms based on MJPs code snippets I ruled this out. It appears as if the dot function inside the shader sometimes (but reproducable) yields wrong results. Also, when switching from D3D_DRIVER_TYPE_HARDWARE to D3D_DRIVER_TYPE_WARP the problem completely disappeared, so to me it seems like this is either some kind of HLSL/DX11 or driver issue. I am using a GTX 980 for rendering and have the latest NVIDIA driver installed, also tried on an older laptop with NVIDIA card (which gave the same strange results).

Here are some images that show the problem:

Spoiler

The final scene rendered with D3D_DRIVER_TYPE_HARDWARE:

U39aMuY.png

Visualization of the light composition rendertarget with D3D_DRIVER_TYPE_HARDWARE:

97DuaU9.png

The final scene rendered with D3D_DRIVER_TYPE_WARP (this is how it should always look like!) :

ExBpTax.png

So when debugging the wrong pixels with the Visual Studio Graphics Analyzer I found out that the hlsl dot function during my point light computations return unexpected and wrong values:

RLV9uR2.png

And the dot product of (-0.51, 0.78, 0.36) and (0, 1, 0) obviously should not be 0...

I am no expert in asm hlsl output, but the compiled shader code looks like this (last line is the dot product of lightVec and normal):

xcs9h6U.png

Does anyone have an idea on how to fix this issue or on how to avoid the strange dot product behavior?

Share this post


Link to post
Share on other sites
Advertisement

It's a complete shot in the dark, but how about implementing the dot product yourself, and seeing what that yields? I once ran into a similar issue with a pow() method in a mobile environment where on one device it would give erroneous results, and on the other correct. Though, haha, it was a mobile environment.

But, yeah. The dot product is a relatively simple operation to implement, and other than tooling around with your drivers, you can eliminate that as a variable. Though, tbh if you are experiencing this same issue on different generations of GPUs, i'm not sure it's the right direction either. But, hey who knows.

 

Share this post


Link to post
Share on other sites

WARP working and hardware not can indeed be an indication for a driver error. But a dot ? I also bet the compiler will issue a dot instruction even you handwrite it :P

That debug view is suspicious but I for one wouldn't trust it. I never had much luck with shader debugging particularly because of such behaviour. But since you got output, color debugging it is. Dump the dot result directly afterwards, "wrapped lighting" style:

return float4(diffuseFactor.xxx * 0.5 + 0.5, 1.0);

I rather suspect NaNs coming from those pows or something. Check the shader compiler log, they might spit out warnings.

Share this post


Link to post
Share on other sites

Thank you guys ;)
I rarely use shader debugging myself but in this case I didn't know what else to do.

Well, I finally found the reason for this strange behavior:

After implementing the dot product by myself like this:

float diffuseFactorXY = lightVec.x * normal.x + lightVec.y * normal.y;
float diffuseFactorZ = lightVec.z * normal.z;

float diffuseFactor = diffuseFactorXY + diffuseFactorZ;

I noticed that only the diffuseFactorZ is causing the issues, and specifically it was the normal.z value. So I took a closer look where it came from.

I am using compressed normals in my g-buffer, so I only store the x and y component and reconstruct the z component with sqrt(1 - normal.x^2 - normal.y^2) and by recovering the normal sign from another g-buffer entry.
However, I forget to normalize my normal before putting it inside the g-buffer and this resulted in negative values in the sqrt function. So yeah, sometimes the simplest mistakes can cause really strange issues at a totally different place.

I still find it confusing that WARP ignored this issue and seemed to return 0 instead of NaN from sqrt. Also, the debugger didn't show the normal.z value as NaN but simply as 0. There still is a NaN value in hlsl, isn't there?

Share this post


Link to post
Share on other sites

WARP uses CPU so floats should behave IEEE compliant. Don't rely on this for GPU. At least assume they can behave differently. Probably the debugger emulates the instructions on CPU, too.

You can check for NaNs in HLSL though:

if(isnan(whatever))
    return float4(1.0, 0.0, 0.0, 1.0); // Red alert, this is not a drill

Final note: Instrumenting your HLSL code can of course rearrange the instructions and give different results. And then hide the bug :( 

Edited by unbird

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!