Jump to content
  • Advertisement
Sign in to follow this  
Shnoutz

DX12 [DX12] Swapped component.

This topic is 840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hmm.

 

So I have this shader that reduce a depth buffer into near/far depth tiles. Also finds the average (mid) depth between near and far and the "nearest before mid" and "furthest after mid"... It then stores a float4(near, far, nearest before mid, furthest after mid) into a texture.

 

The shader works in DirectX11 but does something very strange in DirectX12. The last components of the float4 that I store in the texture are swapped. I mean its like output.xywz instead of output.xyzw. No modification is applied to the shader in DX12.

 

When reading the output texture I do not use components mapping I use D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING.

 

Heres the shader (Im sorry for the macros, I use them to have the option to use reverse depth buffers) :

#define OX_NEAR_DEPTH 0.0f
#define OX_FAR_DEPTH 1.0f
#define OX_NEAREST_DEPTH(A,B) min((A),(B))
#define OX_FARTHEST_DEPTH(A,B) max((A),(B))
#define OX_IS_DEPTH_CLOSER(A,B) ((A)<(B))
#define OX_IS_DEPTH_FURTHER(A,B) ((A)>(B))

float fixDepth(float depth)
{
	return depth != OX_FAR_DEPTH ? depth : OX_NEAR_DEPTH;
}

groupshared float gs_groupDepths[64];
groupshared float gs_groupNearestDepths[32];
groupshared float gs_groupFarthestDepths[32];

Texture2D< float > SrcDepth : register(t0);
RWTexture2D< float4 > DepthTiles : register(u0);

[numthreads(8, 8, 1)]
void cs_computeDepthTiles(
	uint3 DTid : SV_DispatchThreadID,
	uint3 Gid : SV_GroupID,
	uint3 GTid : SV_GroupThreadID,
	uint Gidx : SV_GroupIndex)
{
	const uint threadIndex = GTid.y * 8 + GTid.x;

	// Initialize depths, keep in shared memory
	const float depth = SrcDepth[DTid.xy];
	gs_groupDepths[threadIndex] = depth;
	GroupMemoryBarrierWithGroupSync();

	// 2x downsample with far depth flagging
	if(threadIndex < 32)
	{
		const float d0 = gs_groupDepths[threadIndex];
		const float d1 = gs_groupDepths[threadIndex + 32];
		gs_groupNearestDepths[threadIndex] = OX_NEAREST_DEPTH(d0, d1);
		gs_groupFarthestDepths[threadIndex] = OX_FARTHEST_DEPTH(fixDepth(d0), fixDepth(d1));
	}
	GroupMemoryBarrierWithGroupSync();

	// Parallel reduction
	uint s;
	[unroll]
	for(s = 16; s > 0; s >>= 1)
	{
		if(threadIndex < s)
		{
			// Nearest
			{
				const float d0 = gs_groupNearestDepths[threadIndex];
				const float d1 = gs_groupNearestDepths[threadIndex + s];
				gs_groupNearestDepths[threadIndex] = OX_NEAREST_DEPTH(d0, d1);
			}
			// Farthest
			{
				const float d0 = gs_groupFarthestDepths[threadIndex];
				const float d1 = gs_groupFarthestDepths[threadIndex + s];
				gs_groupFarthestDepths[threadIndex] = OX_FARTHEST_DEPTH(d0, d1);
			}
		}
		GroupMemoryBarrierWithGroupSync();
	}

	// Tile nearest & farthest depth
	const float tileNearest = gs_groupNearestDepths[0];
	const float tileFarthest = gs_groupFarthestDepths[0];

	// Tile mid depth
	const float tileMid = (tileFarthest + tileNearest) * 0.5f;

	// Initialize mid depths
	if(threadIndex < 32)
	{
		const float d0 = gs_groupDepths[threadIndex];
		const float d1 = gs_groupDepths[threadIndex + 32];
		const bool c0 = OX_IS_DEPTH_CLOSER(d0, tileMid);
		const bool c1 = OX_IS_DEPTH_CLOSER(d1, tileMid);
		// Farthest before average depth
		{
			const float f0 = c0 ? d0 : tileNearest;
			const float f1 = c1 ? d1 : tileNearest;
			gs_groupFarthestDepths[threadIndex] = OX_FARTHEST_DEPTH(f0, f1);
		}
		// Nearest past average depth
		{
			const float n0 = c0 ? tileFarthest : d0;
			const float n1 = c1 ? tileFarthest : d1;
			gs_groupNearestDepths[threadIndex] = OX_NEAREST_DEPTH(n0, n1);
		}
	}
	GroupMemoryBarrierWithGroupSync();

	// Parallel reduction
	[unroll]
	for(s = 16; s > 0; s >>= 1)
	{
		if(threadIndex < s)
		{
			// Nearest
			{
				const float d0 = gs_groupNearestDepths[threadIndex];
				const float d1 = gs_groupNearestDepths[threadIndex + s];
				gs_groupNearestDepths[threadIndex] = OX_NEAREST_DEPTH(d0, d1);
			}
			// Farthest
			{
				const float d0 = gs_groupFarthestDepths[threadIndex];
				const float d1 = gs_groupFarthestDepths[threadIndex + s];
				gs_groupFarthestDepths[threadIndex] = OX_FARTHEST_DEPTH(d0, d1);
			}
		}
		GroupMemoryBarrierWithGroupSync();
	}

	// Tile mid depths
	const float tileFarthestBeforeMid = gs_groupFarthestDepths[0];
	const float tileNearestPastMid = gs_groupNearestDepths[0];

	// Output
	if(threadIndex == 0)
		DepthTiles[Gid.xy] = float4(tileNearest, tileFarthest, tileFarthestBeforeMid, tileNearestPastMid);
}

I have the feeling the shader itself is irrelevant to the problem so here's a bit more information.

 

The DepthTiles texture is a 16bits floating point RGBA texture.

The problem is DX12 specific... DX11 is fine with SM5.0.

In DX12, the problem occurs on SM5.0, SM5.1, with warp and hardward adapters.

 

If requested, I can disclose the whole program and or vs graphics debugger captures.

Share this post


Link to post
Share on other sites
Advertisement

Sounds a bit weird to me, especially if it still occurs on SM5.0 and WARP.

 

Probably easiest if you can share the whole program and I'll take a look.

Share this post


Link to post
Share on other sites

Thanks to Adam and the Warp team at Microsoft, this issue is resolved.

 

The problem is (obviously) in my code. I was missing a GroupMemoryBarrierWithGroupSync() here:
 

// Tile nearest & farthest depth
const float tileNearest = gs_groupNearestDepths[0];
const float tileFarthest = gs_groupFarthestDepths[0];

GroupMemoryBarrierWithGroupSync(); // THAT WAS MISSING

// Tile mid depth
const float tileMid = (tileFarthest + tileNearest) * 0.5f;

The reason given by the warp team is :

 

 

 

Without this barrier, earlier dispatch groups can overwrite the GS memory before later groups read the values. The reason it would work on hardware is that most hardware has 32 or more parallel compute units, whereas Warp has only 4.

 

Today I learned that reads and writes to shared memory must be protected by barriers even in less than obvious cases.

 

Cheers!

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!