Jump to content
  • Advertisement
Sign in to follow this  
xnunes

Very weird behave of HLSL min function

This topic is 2985 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys,

I was programming a hull shader for a terrain rendering with lod today. So, at the constant part of hull shader, I have to calculate the Tessellation amount according to the camera position.

Using this:

float v0DistCamera = distance( g_vCameraPosWorld, ip[0].vPosition );
float v1DistCamera = distance( g_vCameraPosWorld, ip[1].vPosition );
float v2DistCamera = distance( g_vCameraPosWorld, ip[2].vPosition );
float v3DistCamera = distance( g_vCameraPosWorld, ip[3].vPosition );

float minDistEdgeV0V1 = min(v0DistCamera,v1DistCamera);
float minDistEdgeV1V2 = min(v1DistCamera,v2DistCamera);
float minDistEdgeV2V3 = min(v2DistCamera,v3DistCamera);
float minDistEdgeV0V3 = min(v0DistCamera,v3DistCamera);

Is there ANY change that:

float minDistEdgeV0V3 = min(v0DistCamera,v3DistCamera);

AND

float minDistEdgeV0V3 = min(v3DistCamera,v0DistCamera);

would return different results to minDistEdgeV0V3 variable?

Because, I don't know why, changing the order of the parameters also changes the behavior.

Thanks a lot.


Share this post


Link to post
Share on other sites
Advertisement
Sounds bizarre!

Maybe there's something in the min function acting weird. Does it definitely take two floating point variables? And if so is there any way it could cut part of one/both of the variables off to do the output calculation?

I'm not sure if this is possible in a HLSL script but can you put a (float) cast before the min function call so it defintely returns a float or even a larger accuracy variable?

e.g. :

(float)min(v0DistCamera,v3DistCamera);

or even:

(double)min(v0DistCamera,v3DistCamera);

I might be giving useful information here or I might be just teaching my Grandmother to suck eggs. I've had that much help of this forum lately though I do feel it's time to start *trying* to put something back in ;o)

Can you write your own min function that will definitely have the required accuracy and result without a performance hit?

Share this post


Link to post
Share on other sites
Have a look at the compiled assembly to see what's actually going on.

Share this post


Link to post
Share on other sites
Either compile with fxc.exe, look at the shader in the PIX.

Share this post


Link to post
Share on other sites
There still is no PIX support for tessellation shaders or compute shaders. Also, PIX doesn't provide any asm debugging for D3D10+.

You'll have to use fxc on the command line to see the assembly dump. Try different optimization levels and see if that affects your result.

The min function will be based on the input type, you don't need to cast it if your data is already float. The asm will probably be either min for float data or umin for int/uint data.

Let us know what you find. Perhaps there is a compiler bug.

Share this post


Link to post
Share on other sites
With optimizations disabled the same code is outputed for both cases ( inverting the variable at min parameter or not ).

With optimizations enabled, inverting v0DistCamera and v3DistCamera, produces a totally different output for the part of the parameter inversion.

But I can't figure out if it is a sneaky optimization or a compiler bug. It really seams to me to be a compiler bug.

I'm posting the code of the optimized part of both version. My problem is solved since im using if's instead of min function, but it might help the community if a compiler bug is reported. Maybe there is someone with good assembly shader skill that can realize whats the problem.

VERSION 1 OF HULL_SHADER CONSTANT PART


HS_CONSTANT_DATA_OUTPUT BezierConstantHS( InputPatch<VS_CONTROL_POINT_OUTPUT, 4> ip,
uint PatchID : SV_PrimitiveID )
{
HS_CONSTANT_DATA_OUTPUT Output;

float v0DistCamera = distance( g_vCameraPosWorld, ip[0].vPosition );
float v1DistCamera = distance( g_vCameraPosWorld, ip[1].vPosition );
float v2DistCamera = distance( g_vCameraPosWorld, ip[2].vPosition );
float v3DistCamera = distance( g_vCameraPosWorld, ip[3].vPosition );

float minDistEdgeV0V1 = min(v0DistCamera,v1DistCamera);
float minDistEdgeV1V2 = min(v1DistCamera,v2DistCamera);
float minDistEdgeV2V3 = min(v2DistCamera,v3DistCamera);
float minDistEdgeV0V3 = min(v3DistCamera,v0DistCamera );



float3 midPoint = lerp(ip[0].vPosition,ip[2].vPosition,0.5);

float midDist = distance( g_vCameraPosWorld, midPoint );


Output.Edges[0] = calculateTessFactor( minDistEdgeV0V3 );
Output.Edges[1] = calculateTessFactor( minDistEdgeV0V1 );
Output.Edges[2] = calculateTessFactor( minDistEdgeV1V2 );
Output.Edges[3] = calculateTessFactor( minDistEdgeV2V3 );

Output.Inside[0] = Output.Inside[1] = calculateTessFactor( midDist );

return Output;
}



VERSION 2 OF HULL SHADER CONSTANT PART

HS_CONSTANT_DATA_OUTPUT BezierConstantHS( InputPatch<VS_CONTROL_POINT_OUTPUT, 4> ip,
uint PatchID : SV_PrimitiveID )
{
HS_CONSTANT_DATA_OUTPUT Output;

float v0DistCamera = distance( g_vCameraPosWorld, ip[0].vPosition );
float v1DistCamera = distance( g_vCameraPosWorld, ip[1].vPosition );
float v2DistCamera = distance( g_vCameraPosWorld, ip[2].vPosition );
float v3DistCamera = distance( g_vCameraPosWorld, ip[3].vPosition );

float minDistEdgeV0V1 = min(v0DistCamera,v1DistCamera);
float minDistEdgeV1V2 = min(v1DistCamera,v2DistCamera);
float minDistEdgeV2V3 = min(v2DistCamera,v3DistCamera);
float minDistEdgeV0V3 = min(v0DistCamera,v3DistCamera);



float3 midPoint = lerp(ip[0].vPosition,ip[2].vPosition,0.5);

float midDist = distance( g_vCameraPosWorld, midPoint );


Output.Edges[0] = calculateTessFactor( minDistEdgeV0V3 );
Output.Edges[1] = calculateTessFactor( minDistEdgeV0V1 );
Output.Edges[2] = calculateTessFactor( minDistEdgeV1V2 );
Output.Edges[3] = calculateTessFactor( minDistEdgeV2V3 );

Output.Inside[0] = Output.Inside[1] = calculateTessFactor( midDist );

return Output;
}



Assembly of version 1

//
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
//
//
// fxc /Zi /E BezierHS /T hs_5_0 /Fx teste.fxo teste.hlsl
//
//
// Buffer Definitions:
//
// cbuffer cbPerFrame
// {
//
// float4x4 g_mViewProjection; // Offset: 0 Size: 64 [unused]
// float3 g_vCameraPosWorld; // Offset: 64 Size: 12
// float g_fTessellationFactor; // Offset: 76 Size: 4 [unused]
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// cbPerFrame cbuffer NA NA 0 1
//
//
//
// Patch Constant signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------ ------
// SV_TessFactor 0 x 0 QUADEDGE float x
// SV_TessFactor 1 x 1 QUADEDGE float x
// SV_TessFactor 2 x 2 QUADEDGE float x
// SV_TessFactor 3 x 3 QUADEDGE float x
// SV_InsideTessFactor 0 x 4 QUADINT float x
// SV_InsideTessFactor 1 x 5 QUADINT float x
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------ ------
// POSITION 0 xyz 0 NONE float xyz
// TEXCOORD 0 xy 1 NONE float xy
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------ ------
// BEZIERPOS 0 xyz 0 NONE float xyz
// TEXCOORD 0 xy 1 NONE float xy
//
// Tessellation Domain # of control points
// -------------------- --------------------
// Quadrilateral 4
//
// Tessellation Output Primitive Partitioning Type
// ------------------------------ ------------------
// Clockwise Triangles Integer
//
hs_5_0
hs_decls
dcl_input_control_point_count 4
dcl_output_control_point_count 4
dcl_tessellator_domain domain_quad
dcl_tessellator_partitioning partitioning_integer
dcl_tessellator_output_primitive output_triangle_cw
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[5], immediateIndexed

#line 112 "C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Utilities\bin\x86\teste.hlsl"
hs_fork_phase
dcl_hs_fork_phase_instance_count 3
dcl_input vForkInstanceID
dcl_input vicp[4][0].xyz
dcl_output_siv o0.x, finalQuadUeq0EdgeTessFactor
dcl_output_siv o1.x, finalQuadVeq0EdgeTessFactor
dcl_output_siv o2.x, finalQuadUeq1EdgeTessFactor
dcl_output_siv o3.x, finalQuadVeq1EdgeTessFactor
dcl_temps 2
dcl_indexrange o0.x 3
iadd r0.x, vForkInstanceID.x, l(3)
and r0.x, r0.x, l(3)
add r0.xyz, cb0[4].xyzx, -vicp[r0.x + 0][0].xyzx

#line 109
dp3 r0.x, r0.xyzx, r0.xyzx
mov r0.y, vForkInstanceID.x
add r1.xyz, cb0[4].xyzx, -vicp[r0.y + 0][0].xyzx

#line 112
dp3 r0.z, r1.xyzx, r1.xyzx // v3DistCamera<0:NaN:Inf>, v0DistCamera<0:NaN:Inf>

#line 117
sqrt r0.xz, r0.xxzx // minDistEdgeV0V3<0:NaN:Inf>

#line 96
min r0.z, r0.z, r0.x
mul r0.z, r0.z, r0.z // BezierConstantHS[r0.y/2]<0:NaN:Inf>

#line 111
div o[r0.y + 0].x, l(125000.000000), r0.z
add r0.yzw, cb0[4].xxyz, -vicp[2][0].xxyz
dp3 r0.y, r0.yzwy, r0.yzwy // v2DistCamera<0:NaN:Inf>

#line 116
sqrt r0.y, r0.y // minDistEdgeV2V3<0:NaN:Inf>

#line 96
min r0.x, r0.x, r0.y
mul r0.x, r0.x, r0.x // BezierConstantHS<3:NaN:Inf>

#line 133
div o3.x, l(125000.000000), r0.x

#line 121
ret
hs_fork_phase // midPoint<0:Inf,1:Inf,2:Inf>
dcl_hs_fork_phase_instance_count 2
dcl_input vForkInstanceID
dcl_input vicp[4][0].xyz
dcl_output_siv o4.x, finalQuadUInsideTessFactor
dcl_output_siv o5.x, finalQuadVInsideTessFactor
dcl_temps 1
dcl_indexrange o4.x 2
add r0.xyz, -vicp[0][0].xyzx, vicp[2][0].xyzx
mad r0.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), vicp[0][0].xyzx

#line 96
add r0.xyz, -r0.xyzx, cb0[4].xyzx
dp3 r0.x, r0.xyzx, r0.xyzx // BezierConstantHS[r0.y/2]<4:NaN:Inf>

#line 133
mov r0.y, vForkInstanceID.x

// incorrect instruction offset in debug info
div o[r0.y + 4].x, l(125000.000000), r0.x
// incorrect instruction offset in debug info


// incorrect instruction offset in debug info
ret
// incorrect instruction offset in debug info

// Approximately 25 instruction slots used



Assembly of version 2

//
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
//
//
// fxc /Zi /E BezierHS /T hs_5_0 /Fx teste2.fxo teste2.hlsl
//
//
// Buffer Definitions:
//
// cbuffer cbPerFrame
// {
//
// float4x4 g_mViewProjection; // Offset: 0 Size: 64 [unused]
// float3 g_vCameraPosWorld; // Offset: 64 Size: 12
// float g_fTessellationFactor; // Offset: 76 Size: 4 [unused]
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// cbPerFrame cbuffer NA NA 0 1
//
//
//
// Patch Constant signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------ ------
// SV_TessFactor 0 x 0 QUADEDGE float x
// SV_TessFactor 1 x 1 QUADEDGE float x
// SV_TessFactor 2 x 2 QUADEDGE float x
// SV_TessFactor 3 x 3 QUADEDGE float x
// SV_InsideTessFactor 0 x 4 QUADINT float x
// SV_InsideTessFactor 1 x 5 QUADINT float x
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------ ------
// POSITION 0 xyz 0 NONE float xyz
// TEXCOORD 0 xy 1 NONE float xy
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------ ------
// BEZIERPOS 0 xyz 0 NONE float xyz
// TEXCOORD 0 xy 1 NONE float xy
//
// Tessellation Domain # of control points
// -------------------- --------------------
// Quadrilateral 4
//
// Tessellation Output Primitive Partitioning Type
// ------------------------------ ------------------
// Clockwise Triangles Integer
//
hs_5_0
hs_decls
dcl_input_control_point_count 4
dcl_output_control_point_count 4
dcl_tessellator_domain domain_quad
dcl_tessellator_partitioning partitioning_integer
dcl_tessellator_output_primitive output_triangle_cw
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[5], immediateIndexed

#line 109 "C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Utilities\bin\x86\teste2.hlsl"
hs_fork_phase
dcl_hs_fork_phase_instance_count 4
dcl_input vForkInstanceID
dcl_input vicp[4][0].xyz
dcl_output_siv o0.x, finalQuadUeq0EdgeTessFactor
dcl_output_siv o1.x, finalQuadVeq0EdgeTessFactor
dcl_output_siv o2.x, finalQuadUeq1EdgeTessFactor
dcl_output_siv o3.x, finalQuadVeq1EdgeTessFactor
dcl_temps 1
dcl_indexrange o0.x 4

#line 112
ult r0.x, vForkInstanceID.x, l(1)

#line 109
iadd r0.yz, vForkInstanceID.xxxx, l(0, -1, 2, 0)

#line 112
movc r0.x, r0.x, l(0), r0.y
udiv null, r0.y, r0.z, l(3)
add r0.yzw, cb0[4].xxyz, -vicp[r0.y + 1][0].xxyz

#line 109
dp3 r0.y, r0.yzwy, r0.yzwy
add r0.xzw, cb0[4].xxyz, -vicp[r0.x + 0][0].xxyz

#line 112
dp3 r0.x, r0.xzwx, r0.xzwx // v0DistCamera<0:NaN:Inf>, v3DistCamera<0:NaN:Inf>

#line 117
sqrt r0.xy, r0.xyxx // minDistEdgeV0V3<0:NaN:Inf>

#line 96
min r0.x, r0.y, r0.x
mul r0.x, r0.x, r0.x
mov r0.y, vForkInstanceID.x // BezierConstantHS[r0.y/2]<0:NaN:Inf>

#line 133
div o[r0.y + 0].x, l(125000.000000), r0.x

#line 121
ret
hs_fork_phase // midPoint<0:Inf,1:Inf,2:Inf>
dcl_hs_fork_phase_instance_count 2
dcl_input vForkInstanceID
dcl_input vicp[4][0].xyz
dcl_output_siv o4.x, finalQuadUInsideTessFactor
dcl_output_siv o5.x, finalQuadVInsideTessFactor
dcl_temps 1
dcl_indexrange o4.x 2
add r0.xyz, -vicp[0][0].xyzx, vicp[2][0].xyzx
mad r0.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), vicp[0][0].xyzx

#line 96
add r0.xyz, -r0.xyzx, cb0[4].xyzx
dp3 r0.x, r0.xyzx, r0.xyzx // BezierConstantHS[r0.y/2]<4:NaN:Inf>

#line 133
mov r0.y, vForkInstanceID.x

// incorrect instruction offset in debug info
div o[r0.y + 4].x, l(125000.000000), r0.x
// incorrect instruction offset in debug info


// incorrect instruction offset in debug info
ret
// incorrect instruction offset in debug info

// Approximately 21 instruction slots used

Share this post


Link to post
Share on other sites
Hey there,

I think I may have encountered a related compiler bug. I have a fairly complex pixel shader performing tricubic volume ray-casting of a k-d tree hierarchy (involving stack-based tree traversal).
I noticed issues with the ray-node intersections and tracked it down to be a weird compiler bug, using the February 2010 SDK (the shader won't compile on the June SDK, complaining about not being able to unroll a loop, but that's a different story, as I already had this annoying problem with the August 2009 SDK..). Using PIX for debugging, I noticed that the value of a uniform float3 variable changes during the execution of the pixel shader! Digging further, I checked the unoptimized assembly (the problem persists for optimization level 3 too, but I haven't checked out the other ones) and came across this:

HLSL:
uniform float3 Extents; // should obviously be constant during execution
// later on, inside a dynamic loop and some nested if's
float3 discreteMax = ...;
float3 boxMax = min(Extents, discreteMax);

assembly:
// interesting: compiler copies uniform Extents (in cb0[4].xyz) to register 25
mov r25.xyz, cb0[4].xyzx
// inside loop and if: compiler uses r25 for Extents in min() and
// stores the boxMax result again in r25
min r25.xyz, r25.xyzx, r12.xyzx
// a few lines below, boxMax in r25 is used as input
add r30.xyz, r11.yzwy, r25.xyzx
// a few lines below, right before "end if":
// incredibly smart, compiler now aims at restoring r25 from boxMax to Extents
// for further loop iterations, but doesn't use cb0[4], but its current value -
// copying itself to itself :D
mov r25.xyz, r25.xyzx

PIX seems to display r25's value for the Extents variable as well, not the real value in cb0[4]. The displayed value therefore changes after the min instruction, if any dimension of discreteMax (in r12.xyz) is smaller than Extents. Please note that the variable is not being hidden by a local one with the same name.
I'll try to produce a minimal faulting shader and keep on investigating.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!