Very weird behave of HLSL min function

Started by
10 comments, last by kink 13 years, 9 months ago
Hi guys,

I was programming a hull shader for a terrain rendering with lod today. So, at the constant part of hull shader, I have to calculate the Tessellation amount according to the camera position.

Using this:

float v0DistCamera = distance( g_vCameraPosWorld, ip[0].vPosition );
float v1DistCamera = distance( g_vCameraPosWorld, ip[1].vPosition );
float v2DistCamera = distance( g_vCameraPosWorld, ip[2].vPosition );
float v3DistCamera = distance( g_vCameraPosWorld, ip[3].vPosition );

float minDistEdgeV0V1 = min(v0DistCamera,v1DistCamera);
float minDistEdgeV1V2 = min(v1DistCamera,v2DistCamera);
float minDistEdgeV2V3 = min(v2DistCamera,v3DistCamera);
float minDistEdgeV0V3 = min(v0DistCamera,v3DistCamera);

Is there ANY change that:

float minDistEdgeV0V3 = min(v0DistCamera,v3DistCamera);

AND

float minDistEdgeV0V3 = min(v3DistCamera,v0DistCamera);

would return different results to minDistEdgeV0V3 variable?

Because, I don't know why, changing the order of the parameters also changes the behavior.

Thanks a lot.


Advertisement
btw, using four if's instead of using min function gives me the correct result.
Sounds bizarre!

Maybe there's something in the min function acting weird. Does it definitely take two floating point variables? And if so is there any way it could cut part of one/both of the variables off to do the output calculation?

I'm not sure if this is possible in a HLSL script but can you put a (float) cast before the min function call so it defintely returns a float or even a larger accuracy variable?

e.g. :

(float)min(v0DistCamera,v3DistCamera);

or even:

(double)min(v0DistCamera,v3DistCamera);

I might be giving useful information here or I might be just teaching my Grandmother to suck eggs. I've had that much help of this forum lately though I do feel it's time to start *trying* to put something back in ;o)

Can you write your own min function that will definitely have the required accuracy and result without a performance hit?

Have a look at the compiled assembly to see what's actually going on.
How do I see the assembly?
Either compile with fxc.exe, look at the shader in the PIX.
Is there pix support already for hull and domain shaders?
There still is no PIX support for tessellation shaders or compute shaders. Also, PIX doesn't provide any asm debugging for D3D10+.

You'll have to use fxc on the command line to see the assembly dump. Try different optimization levels and see if that affects your result.

The min function will be based on the input type, you don't need to cast it if your data is already float. The asm will probably be either min for float data or umin for int/uint data.

Let us know what you find. Perhaps there is a compiler bug.
With optimizations disabled the same code is outputed for both cases ( inverting the variable at min parameter or not ).

With optimizations enabled, inverting v0DistCamera and v3DistCamera, produces a totally different output for the part of the parameter inversion.

But I can't figure out if it is a sneaky optimization or a compiler bug. It really seams to me to be a compiler bug.

I'm posting the code of the optimized part of both version. My problem is solved since im using if's instead of min function, but it might help the community if a compiler bug is reported. Maybe there is someone with good assembly shader skill that can realize whats the problem.

VERSION 1 OF HULL_SHADER CONSTANT PART
HS_CONSTANT_DATA_OUTPUT BezierConstantHS( InputPatch<VS_CONTROL_POINT_OUTPUT, 4> ip,                                          uint PatchID : SV_PrimitiveID ){        HS_CONSTANT_DATA_OUTPUT Output;	float v0DistCamera = distance( g_vCameraPosWorld, ip[0].vPosition );	float v1DistCamera = distance( g_vCameraPosWorld, ip[1].vPosition );	float v2DistCamera = distance( g_vCameraPosWorld, ip[2].vPosition );	float v3DistCamera = distance( g_vCameraPosWorld, ip[3].vPosition );	float minDistEdgeV0V1 = min(v0DistCamera,v1DistCamera);       float minDistEdgeV1V2 = min(v1DistCamera,v2DistCamera);      float minDistEdgeV2V3 = min(v2DistCamera,v3DistCamera);      float minDistEdgeV0V3 = min(v3DistCamera,v0DistCamera );	float3 midPoint = lerp(ip[0].vPosition,ip[2].vPosition,0.5);	float midDist = distance( g_vCameraPosWorld, midPoint );	    Output.Edges[0] = calculateTessFactor( minDistEdgeV0V3 );	Output.Edges[1] = calculateTessFactor( minDistEdgeV0V1 );	Output.Edges[2] = calculateTessFactor( minDistEdgeV1V2 );	Output.Edges[3] = calculateTessFactor( minDistEdgeV2V3 );    Output.Inside[0] = Output.Inside[1] = calculateTessFactor( midDist );    return Output;}


VERSION 2 OF HULL SHADER CONSTANT PART
HS_CONSTANT_DATA_OUTPUT BezierConstantHS( InputPatch<VS_CONTROL_POINT_OUTPUT, 4> ip,                                          uint PatchID : SV_PrimitiveID ){        HS_CONSTANT_DATA_OUTPUT Output;	float v0DistCamera = distance( g_vCameraPosWorld, ip[0].vPosition );	float v1DistCamera = distance( g_vCameraPosWorld, ip[1].vPosition );	float v2DistCamera = distance( g_vCameraPosWorld, ip[2].vPosition );	float v3DistCamera = distance( g_vCameraPosWorld, ip[3].vPosition );	float minDistEdgeV0V1 = min(v0DistCamera,v1DistCamera);       float minDistEdgeV1V2 = min(v1DistCamera,v2DistCamera);      float minDistEdgeV2V3 = min(v2DistCamera,v3DistCamera);      float minDistEdgeV0V3 = min(v0DistCamera,v3DistCamera);	float3 midPoint = lerp(ip[0].vPosition,ip[2].vPosition,0.5);	float midDist = distance( g_vCameraPosWorld, midPoint );	    Output.Edges[0] = calculateTessFactor( minDistEdgeV0V3 );	Output.Edges[1] = calculateTessFactor( minDistEdgeV0V1 );	Output.Edges[2] = calculateTessFactor( minDistEdgeV1V2 );	Output.Edges[3] = calculateTessFactor( minDistEdgeV2V3 );    Output.Inside[0] = Output.Inside[1] = calculateTessFactor( midDist );    return Output;}


Assembly of version 1
//// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111//////   fxc /Zi /E BezierHS /T hs_5_0 /Fx teste.fxo teste.hlsl////// Buffer Definitions: //// cbuffer cbPerFrame// {////   float4x4 g_mViewProjection;        // Offset:    0 Size:    64 [unused]//   float3 g_vCameraPosWorld;          // Offset:   64 Size:    12//   float g_fTessellationFactor;       // Offset:   76 Size:     4 [unused]//// }////// Resource Bindings://// Name                                 Type  Format         Dim Slot Elements// ------------------------------ ---------- ------- ----------- ---- --------// cbPerFrame                        cbuffer      NA          NA    0        1//////// Patch Constant signature://// Name                 Index   Mask Register SysValue Format   Used// -------------------- ----- ------ -------- -------- ------ ------// SV_TessFactor            0   x           0 QUADEDGE  float   x   // SV_TessFactor            1   x           1 QUADEDGE  float   x   // SV_TessFactor            2   x           2 QUADEDGE  float   x   // SV_TessFactor            3   x           3 QUADEDGE  float   x   // SV_InsideTessFactor      0   x           4  QUADINT  float   x   // SV_InsideTessFactor      1   x           5  QUADINT  float   x   ////// Input signature://// Name                 Index   Mask Register SysValue Format   Used// -------------------- ----- ------ -------- -------- ------ ------// POSITION                 0   xyz         0     NONE  float   xyz // TEXCOORD                 0   xy          1     NONE  float   xy  ////// Output signature://// Name                 Index   Mask Register SysValue Format   Used// -------------------- ----- ------ -------- -------- ------ ------// BEZIERPOS                0   xyz         0     NONE  float   xyz // TEXCOORD                 0   xy          1     NONE  float   xy  //// Tessellation Domain   # of control points// -------------------- --------------------// Quadrilateral                           4//// Tessellation Output Primitive  Partitioning Type // ------------------------------ ------------------// Clockwise Triangles            Integer           //hs_5_0hs_decls dcl_input_control_point_count 4dcl_output_control_point_count 4dcl_tessellator_domain domain_quaddcl_tessellator_partitioning partitioning_integerdcl_tessellator_output_primitive output_triangle_cwdcl_globalFlags refactoringAlloweddcl_constantbuffer cb0[5], immediateIndexed#line 112 "C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Utilities\bin\x86\teste.hlsl"hs_fork_phase dcl_hs_fork_phase_instance_count 3dcl_input vForkInstanceIDdcl_input vicp[4][0].xyzdcl_output_siv o0.x, finalQuadUeq0EdgeTessFactordcl_output_siv o1.x, finalQuadVeq0EdgeTessFactordcl_output_siv o2.x, finalQuadUeq1EdgeTessFactordcl_output_siv o3.x, finalQuadVeq1EdgeTessFactordcl_temps 2dcl_indexrange o0.x 3iadd r0.x, vForkInstanceID.x, l(3)and r0.x, r0.x, l(3)add r0.xyz, cb0[4].xyzx, -vicp[r0.x + 0][0].xyzx#line 109dp3 r0.x, r0.xyzx, r0.xyzxmov r0.y, vForkInstanceID.xadd r1.xyz, cb0[4].xyzx, -vicp[r0.y + 0][0].xyzx#line 112dp3 r0.z, r1.xyzx, r1.xyzx  // v3DistCamera<0:NaN:Inf>, v0DistCamera<0:NaN:Inf>#line 117sqrt r0.xz, r0.xxzx  // minDistEdgeV0V3<0:NaN:Inf>#line 96min r0.z, r0.z, r0.xmul r0.z, r0.z, r0.z  // BezierConstantHS[r0.y/2]<0:NaN:Inf>#line 111div o[r0.y + 0].x, l(125000.000000), r0.zadd r0.yzw, cb0[4].xxyz, -vicp[2][0].xxyzdp3 r0.y, r0.yzwy, r0.yzwy  // v2DistCamera<0:NaN:Inf>#line 116sqrt r0.y, r0.y  // minDistEdgeV2V3<0:NaN:Inf>#line 96min r0.x, r0.x, r0.ymul r0.x, r0.x, r0.x  // BezierConstantHS<3:NaN:Inf>#line 133div o3.x, l(125000.000000), r0.x#line 121ret hs_fork_phase   // midPoint<0:Inf,1:Inf,2:Inf>dcl_hs_fork_phase_instance_count 2dcl_input vForkInstanceIDdcl_input vicp[4][0].xyzdcl_output_siv o4.x, finalQuadUInsideTessFactordcl_output_siv o5.x, finalQuadVInsideTessFactordcl_temps 1dcl_indexrange o4.x 2add r0.xyz, -vicp[0][0].xyzx, vicp[2][0].xyzxmad r0.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), vicp[0][0].xyzx#line 96add r0.xyz, -r0.xyzx, cb0[4].xyzxdp3 r0.x, r0.xyzx, r0.xyzx  // BezierConstantHS[r0.y/2]<4:NaN:Inf>#line 133mov r0.y, vForkInstanceID.x// incorrect instruction offset in debug infodiv o[r0.y + 4].x, l(125000.000000), r0.x// incorrect instruction offset in debug info// incorrect instruction offset in debug inforet // incorrect instruction offset in debug info// Approximately 25 instruction slots used


Assembly of version 2
//// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111//////   fxc /Zi /E BezierHS /T hs_5_0 /Fx teste2.fxo teste2.hlsl////// Buffer Definitions: //// cbuffer cbPerFrame// {////   float4x4 g_mViewProjection;        // Offset:    0 Size:    64 [unused]//   float3 g_vCameraPosWorld;          // Offset:   64 Size:    12//   float g_fTessellationFactor;       // Offset:   76 Size:     4 [unused]//// }////// Resource Bindings://// Name                                 Type  Format         Dim Slot Elements// ------------------------------ ---------- ------- ----------- ---- --------// cbPerFrame                        cbuffer      NA          NA    0        1//////// Patch Constant signature://// Name                 Index   Mask Register SysValue Format   Used// -------------------- ----- ------ -------- -------- ------ ------// SV_TessFactor            0   x           0 QUADEDGE  float   x   // SV_TessFactor            1   x           1 QUADEDGE  float   x   // SV_TessFactor            2   x           2 QUADEDGE  float   x   // SV_TessFactor            3   x           3 QUADEDGE  float   x   // SV_InsideTessFactor      0   x           4  QUADINT  float   x   // SV_InsideTessFactor      1   x           5  QUADINT  float   x   ////// Input signature://// Name                 Index   Mask Register SysValue Format   Used// -------------------- ----- ------ -------- -------- ------ ------// POSITION                 0   xyz         0     NONE  float   xyz // TEXCOORD                 0   xy          1     NONE  float   xy  ////// Output signature://// Name                 Index   Mask Register SysValue Format   Used// -------------------- ----- ------ -------- -------- ------ ------// BEZIERPOS                0   xyz         0     NONE  float   xyz // TEXCOORD                 0   xy          1     NONE  float   xy  //// Tessellation Domain   # of control points// -------------------- --------------------// Quadrilateral                           4//// Tessellation Output Primitive  Partitioning Type // ------------------------------ ------------------// Clockwise Triangles            Integer           //hs_5_0hs_decls dcl_input_control_point_count 4dcl_output_control_point_count 4dcl_tessellator_domain domain_quaddcl_tessellator_partitioning partitioning_integerdcl_tessellator_output_primitive output_triangle_cwdcl_globalFlags refactoringAlloweddcl_constantbuffer cb0[5], immediateIndexed#line 109 "C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Utilities\bin\x86\teste2.hlsl"hs_fork_phase dcl_hs_fork_phase_instance_count 4dcl_input vForkInstanceIDdcl_input vicp[4][0].xyzdcl_output_siv o0.x, finalQuadUeq0EdgeTessFactordcl_output_siv o1.x, finalQuadVeq0EdgeTessFactordcl_output_siv o2.x, finalQuadUeq1EdgeTessFactordcl_output_siv o3.x, finalQuadVeq1EdgeTessFactordcl_temps 1dcl_indexrange o0.x 4#line 112ult r0.x, vForkInstanceID.x, l(1)#line 109iadd r0.yz, vForkInstanceID.xxxx, l(0, -1, 2, 0)#line 112movc r0.x, r0.x, l(0), r0.yudiv null, r0.y, r0.z, l(3)add r0.yzw, cb0[4].xxyz, -vicp[r0.y + 1][0].xxyz#line 109dp3 r0.y, r0.yzwy, r0.yzwyadd r0.xzw, cb0[4].xxyz, -vicp[r0.x + 0][0].xxyz#line 112dp3 r0.x, r0.xzwx, r0.xzwx  // v0DistCamera<0:NaN:Inf>, v3DistCamera<0:NaN:Inf>#line 117sqrt r0.xy, r0.xyxx  // minDistEdgeV0V3<0:NaN:Inf>#line 96min r0.x, r0.y, r0.xmul r0.x, r0.x, r0.xmov r0.y, vForkInstanceID.x  // BezierConstantHS[r0.y/2]<0:NaN:Inf>#line 133div o[r0.y + 0].x, l(125000.000000), r0.x#line 121ret hs_fork_phase   // midPoint<0:Inf,1:Inf,2:Inf>dcl_hs_fork_phase_instance_count 2dcl_input vForkInstanceIDdcl_input vicp[4][0].xyzdcl_output_siv o4.x, finalQuadUInsideTessFactordcl_output_siv o5.x, finalQuadVInsideTessFactordcl_temps 1dcl_indexrange o4.x 2add r0.xyz, -vicp[0][0].xyzx, vicp[2][0].xyzxmad r0.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), vicp[0][0].xyzx#line 96add r0.xyz, -r0.xyzx, cb0[4].xyzxdp3 r0.x, r0.xyzx, r0.xyzx  // BezierConstantHS[r0.y/2]<4:NaN:Inf>#line 133mov r0.y, vForkInstanceID.x// incorrect instruction offset in debug infodiv o[r0.y + 4].x, l(125000.000000), r0.x// incorrect instruction offset in debug info// incorrect instruction offset in debug inforet // incorrect instruction offset in debug info// Approximately 21 instruction slots used
Hey there,

I think I may have encountered a related compiler bug. I have a fairly complex pixel shader performing tricubic volume ray-casting of a k-d tree hierarchy (involving stack-based tree traversal).
I noticed issues with the ray-node intersections and tracked it down to be a weird compiler bug, using the February 2010 SDK (the shader won't compile on the June SDK, complaining about not being able to unroll a loop, but that's a different story, as I already had this annoying problem with the August 2009 SDK..). Using PIX for debugging, I noticed that the value of a uniform float3 variable changes during the execution of the pixel shader! Digging further, I checked the unoptimized assembly (the problem persists for optimization level 3 too, but I haven't checked out the other ones) and came across this:

HLSL:
uniform float3 Extents; // should obviously be constant during execution// later on, inside a dynamic loop and some nested if'sfloat3 discreteMax = ...;float3 boxMax = min(Extents, discreteMax);

assembly:
// interesting: compiler copies uniform Extents (in cb0[4].xyz) to register 25mov r25.xyz, cb0[4].xyzx// inside loop and if: compiler uses r25 for Extents in min() and// stores the boxMax result again in r25min r25.xyz, r25.xyzx, r12.xyzx// a few lines below, boxMax in r25 is used as inputadd r30.xyz, r11.yzwy, r25.xyzx// a few lines below, right before "end if":// incredibly smart, compiler now aims at restoring r25 from boxMax to Extents// for further loop iterations, but doesn't use cb0[4], but its current value -// copying itself to itself :Dmov r25.xyz, r25.xyzx

PIX seems to display r25's value for the Extents variable as well, not the real value in cb0[4]. The displayed value therefore changes after the min instruction, if any dimension of discreteMax (in r12.xyz) is smaller than Extents. Please note that the variable is not being hidden by a local one with the same name.
I'll try to produce a minimal faulting shader and keep on investigating.

This topic is closed to new replies.

Advertisement