Jump to content
  • Advertisement
Corillian

DX12 Anyone else having problems with D3D12 compute shaders on NVIDIA hardware?

This topic is 413 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm having an odd problem with D3D12 compute shaders. I have a very simple compute shader that does nothing but write the global ID of the current thread out to a buffer:

RWStructuredBuffer<uint> g_pathVisibility : register(u0, space1);

cbuffer cbPushConstants : register(b0)
{
	uint g_count;
};

[numthreads(32, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID)
{
	if(DTid.x < g_count)
	{
		g_pathVisibility[DTid.x] = DTid.x + 1;
	}
}

I'm allocating 2 buffers with space or 128 integers. One buffer is the output buffer for the shader above and the other is a copy destination buffer for CPU readback. If I set numthreads() to any power of two, for example it's set to 32 above, I get a device reset error on NVIDIA hardware only. If I set numthreads() to any non-power of 2 value the shader works as expected. The exceptionally odd thing is that all of the compute shaders in the D3D12 samples work fine with numthreads() containing powers of 2. It doesn't matter if I execute the compute shader on a graphics queue or a compute queue - it's the same result either way. I've tested this on a GTX 1080 and a GTX 1070 with identical results. AMD cards seem to work as expected. Anyone have any idea what the hell could be going on? I tried asking NVIDIA on their boards but per-usual they never responded. I'm using their latest drivers. I've attached my sample application if anyone is interested, it's a UWP app since Visual Studio provides a nice D3D12 app template that I use to play around with simple ideas. The shader in question in the project is TestCompute.hlsl and the function where the magic happens is Sample3DSceneRenderer::TestCompute() line 1006 in Sample3DSceneRenderer.cpp.

PathTransform_2.zip

Share this post


Link to post
Share on other sites
Advertisement

I've definitely run into a few Nvidia DX12 driver bugs (especially when DX12 was new), but I haven't personally seen anything with compute shaders. The driver and/or shader JIT is probably just trying to do something clever, and ends up doing something bad. 

Share this post


Link to post
Share on other sites

Thanks, I figured it was likely a driver issue but wanted to make sure I wasn't crazy. I guess I'll continue waiting for the next major driver release.

Share this post


Link to post
Share on other sites

I get no GPU hang here on a 980 Ti but I do get a GPU Based Validation error that you seem to have introduced:

D3D12 ERROR: GPU-BASED VALIDATION: Dispatch, Descriptor heap index out of bounds: Heap Index To DescriptorTableStart: [0], Heap Index From HeapStart: [0], Heap Type: D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, Num Descriptor Entries: 0, Index of Descriptor Range: 0, Shader Stage: COMPUTE, Root Parameter Index: [1], Dispatch Index: [0], Shader Code: TestCompute.hlsl(13,3-40), Asm Instruction Range: [0xbc-0xdf], Asm Operand Index: [0], Command List: 0x000001F3C5E38C20:'m_testComputeList', SRV/UAV/CBV Descriptor Heap: 0x000001F3C5C824B0:'m_testComputeCBVHeap', Sampler Descriptor Heap: <not set>, Pipeline State: 0x000001F3C5973380:'m_testComputePipeline',  [ EXECUTION ERROR #936: GPU_BASED_VALIDATION_DESCRIPTOR_HEAP_INDEX_OUT_OF_BOUNDS]

Share this post


Link to post
Share on other sites

Turns out the hang wasn't 100%. It 'Succeeded' and render the cube after the test for the first few times, but did hang on a later run. The GPU-Based Validation error is still there though.

Share this post


Link to post
Share on other sites

@ajmiles Interesting I don't have any GPU validation errors. Did you change anything in the code or perhaps gobal D3D12 or driver settings? I've tried removing the root constants, setting the UAV register space to 0, and hardcoding g_count to 128 in the shader so that there's only the UAV but that had no effect. I also tried switching it from a RWStructuredBuffer to just RWBuffer but that also had no effect. No matter what I do numthreads() with 32 (or any power of 2) fails and numthreads() with 31 (or any non-power of 2) succeeds. I don't suppose there's any other insight you can provide on your end given that I'm not getting the validation errors? Presumably if the descriptor heap and root descriptor settings were actually invalid it wouldn't be able to successfully write with a non-power of 2 dispatch?

Share this post


Link to post
Share on other sites

It's possible that the version I'm on (16251) has newer GPU Validation bits that what you're running.

What version of Windows 10 are you running? Run 'winver' at a command prompt and there should be an OS Build number in parentheses.

Share this post


Link to post
Share on other sites

That could be it. I'm on build 15063.483 (Creators Update). It looks like you're using a July 26 Windows Insider Preview build. That still doesn't explain why it would be able to successfully write with a non-power of 2 if the descriptor heap was corrupt but not with a power of 2? Do you see anything I'm doing wrong with my descriptor heap?

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!