Jump to content
  • Advertisement

DX12 Anyone else having problems with D3D12 compute shaders on NVIDIA hardware?

Recommended Posts

I'm having an odd problem with D3D12 compute shaders. I have a very simple compute shader that does nothing but write the global ID of the current thread out to a buffer:

RWStructuredBuffer<uint> g_pathVisibility : register(u0, space1);

cbuffer cbPushConstants : register(b0)
	uint g_count;

[numthreads(32, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID)
	if(DTid.x < g_count)
		g_pathVisibility[DTid.x] = DTid.x + 1;

I'm allocating 2 buffers with space or 128 integers. One buffer is the output buffer for the shader above and the other is a copy destination buffer for CPU readback. If I set numthreads() to any power of two, for example it's set to 32 above, I get a device reset error on NVIDIA hardware only. If I set numthreads() to any non-power of 2 value the shader works as expected. The exceptionally odd thing is that all of the compute shaders in the D3D12 samples work fine with numthreads() containing powers of 2. It doesn't matter if I execute the compute shader on a graphics queue or a compute queue - it's the same result either way. I've tested this on a GTX 1080 and a GTX 1070 with identical results. AMD cards seem to work as expected. Anyone have any idea what the hell could be going on? I tried asking NVIDIA on their boards but per-usual they never responded. I'm using their latest drivers. I've attached my sample application if anyone is interested, it's a UWP app since Visual Studio provides a nice D3D12 app template that I use to play around with simple ideas. The shader in question in the project is TestCompute.hlsl and the function where the magic happens is Sample3DSceneRenderer::TestCompute() line 1006 in Sample3DSceneRenderer.cpp.


Share this post

Link to post
Share on other sites

I've definitely run into a few Nvidia DX12 driver bugs (especially when DX12 was new), but I haven't personally seen anything with compute shaders. The driver and/or shader JIT is probably just trying to do something clever, and ends up doing something bad. 

Share this post

Link to post
Share on other sites

Thanks, I figured it was likely a driver issue but wanted to make sure I wasn't crazy. I guess I'll continue waiting for the next major driver release.

Share this post

Link to post
Share on other sites

I get no GPU hang here on a 980 Ti but I do get a GPU Based Validation error that you seem to have introduced:

D3D12 ERROR: GPU-BASED VALIDATION: Dispatch, Descriptor heap index out of bounds: Heap Index To DescriptorTableStart: [0], Heap Index From HeapStart: [0], Heap Type: D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, Num Descriptor Entries: 0, Index of Descriptor Range: 0, Shader Stage: COMPUTE, Root Parameter Index: [1], Dispatch Index: [0], Shader Code: TestCompute.hlsl(13,3-40), Asm Instruction Range: [0xbc-0xdf], Asm Operand Index: [0], Command List: 0x000001F3C5E38C20:'m_testComputeList', SRV/UAV/CBV Descriptor Heap: 0x000001F3C5C824B0:'m_testComputeCBVHeap', Sampler Descriptor Heap: <not set>, Pipeline State: 0x000001F3C5973380:'m_testComputePipeline',  [ EXECUTION ERROR #936: GPU_BASED_VALIDATION_DESCRIPTOR_HEAP_INDEX_OUT_OF_BOUNDS]

Share this post

Link to post
Share on other sites

Turns out the hang wasn't 100%. It 'Succeeded' and render the cube after the test for the first few times, but did hang on a later run. The GPU-Based Validation error is still there though.

Share this post

Link to post
Share on other sites

@ajmiles Interesting I don't have any GPU validation errors. Did you change anything in the code or perhaps gobal D3D12 or driver settings? I've tried removing the root constants, setting the UAV register space to 0, and hardcoding g_count to 128 in the shader so that there's only the UAV but that had no effect. I also tried switching it from a RWStructuredBuffer to just RWBuffer but that also had no effect. No matter what I do numthreads() with 32 (or any power of 2) fails and numthreads() with 31 (or any non-power of 2) succeeds. I don't suppose there's any other insight you can provide on your end given that I'm not getting the validation errors? Presumably if the descriptor heap and root descriptor settings were actually invalid it wouldn't be able to successfully write with a non-power of 2 dispatch?

Share this post

Link to post
Share on other sites

It's possible that the version I'm on (16251) has newer GPU Validation bits that what you're running.

What version of Windows 10 are you running? Run 'winver' at a command prompt and there should be an OS Build number in parentheses.

Share this post

Link to post
Share on other sites

That could be it. I'm on build 15063.483 (Creators Update). It looks like you're using a July 26 Windows Insider Preview build. That still doesn't explain why it would be able to successfully write with a non-power of 2 if the descriptor heap was corrupt but not with a power of 2? Do you see anything I'm doing wrong with my descriptor heap?

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
  • Advertisement
  • Popular Tags

  • Similar Content

    • By Luke McManus
      Hey all,

      So I have this unfathomable obsession with the games LIMBO and INSIDE, two games produced by Playdead Studios. But as a sound designer, I'm specifically into the audio of the game, its use of ambient tracks and specific sounds to create this level of emotion I've never felt in a platformer before.

      I took a stab at redesigning the sound and ambiance of one of LIMBO's trailers to see if I could replicate such emotionality with my own creativity. Check it out below. All feedback is appreciated. 
    • By Roman R.
      I have a problem synchronizing data between shared resources. On the input I receive a D3D11 2D texture, which itself is enabled for sharing and has D3D11_RESOURCE_MISC_SHARED_NTHANDLE in its description.
      Having a D3D12 device created on the same adapter I open a resource through sharing a handle.
                  const CComQIPtr<IDXGIResource1> pDxgiResource1 = pTexture; // <<--- Input texture on 11 device
                  HANDLE hTexture;
                  pDxgiResource1->CreateSharedHandle(NULL, GENERIC_ALL, NULL, &hTexture);
                  CComPtr<ID3D12Resource> pResource; // <<--- Counterparty resource on 12 device
                  pDevice->OpenSharedHandle(hTexture, __uuidof(ID3D12Resource), (VOID**) &pResource);
      I tend to keep the mapping between the 11 texture and 12 resource further as they are re-filled with data, but in context of the current problem it does not matter if I reuse the mapping or I do OpenSharedHandle on every iteration.
      Further on I have a command list on 12 device where I use 12 resource (pResource) as a copy source. It is an argument in further CopyResource or CopyTextureRegion calls. I don't have any resource barriers in the command list (including that my attempts to use any don't change the behavior).
      My problem is that I can't have the data synchronized. Sometimes and especially initially the resource has the correct data, however further iterations have issues such as resource having stale/old data.
      I tried to flush immediate context on 11 device to make sure that preceding commands are completed.
      I tried to insert resource barriers at the beginning of command list to possibly make sure that source resource has time to receive the correct data.
      Same time I have other code paths which don't do OpenSharedHandle mapping and instead do additional texture copying and mapping between original 11 device and 11on12 device, and the code including the rest of the logic works well there. This makes me think that I fail to synchronize the data on the step I mentioned above, even though I am lost how do I synchronize exactly outside of command list.
      I originally thought that 12 resource has a IDXGIKeyedMutex implementation which is the case with sharing-enabled 11 textures, but I don't have the IDXGIKeyedMutex and I don't see what is the D3D12 equivalent, if any.
      Could you please advise where to look at to fix the sync?
    • By RoKabium Games
      Been a bit quiet recently, but we've been busy bug fixing and tweaking things... Now we have lots more 'Particle effects' in the game, specifically here the Flamethrower and Enemy attacks!
    • By Gezu
      I'm working on Warriorb for about 2,5 years now and recently we made a playable version of the first part of the game. There are still some elements missing (mostly vfx and sfx) but most of it is close to final.
      I would love to hear what do you think about our game so far. Any critique, feedback, idea or tip is appreciated.
      If you are interested send me a pm and I will give you a steam key for the game.
      I'm eager to receive your feedback:
    • By JoshuaFraser
      Hi and thanks for reading, I have an issue with this reactive crosshair script, everything works fine until I start changing the offset. Give the script a go and you will see what I mean, when I do SetOffset(0f); it doesnt always set back to the origional state, if anyone can spot a fix I'd be super appreciative!
      using System.Collections; using System.Collections.Generic; using UnityEngine; public class ReactiveCrosshair : MonoBehaviour { [SerializeField] GameObject c_limb_prefab; private float center_offset = 0f; private float current_offset = 0f; private float max_offset = .5f; private int number_of_limbs = 4; private float limb_length = .05f; private float limb_width = .005f; private List<GameObject> c_limbs = new List<GameObject>(); public void SetupCrosshair(){ for (int i = 0; i < number_of_limbs; i++) { GameObject line_go = (GameObject)Instantiate (c_limb_prefab); line_go.transform.SetParent (this.transform); Vector3 limb_pos = new Vector3 (0f,0f,0f); //line_go.transform.position = limb_pos; line_go.transform.localPosition = limb_pos; LineRenderer line = line_go.GetComponent<LineRenderer>(); line.startWidth = limb_width; line.positionCount = 2; line.SetPosition (0, line_go.transform.localPosition + new Vector3(center_offset, 0f, 0f)); line.SetPosition (1, line_go.transform.localPosition + new Vector3(center_offset + limb_length, 0f, 0f)); line.useWorldSpace = false; c_limbs.Add(line_go.gameObject); } if (c_limbs != null) { OrientLimbs (); SetOffset (0f); } } public void OrientLimbs(){ for (int i = 0; i < c_limbs.Count; i++) { float rotation_step = 360f / (float)c_limbs.Count; c_limbs [i].transform.RotateAround (c_limbs[i].transform.position, c_limbs[i].transform.forward, 90f + (rotation_step * (float)i)); } } public void SetOffset(float _current_spread){ float offset = Mathf.Lerp (0f, max_offset, _current_spread); for (int i = 0; i < number_of_limbs; i++) { if (offset > current_offset) { Vector3 pos = c_limbs [i].transform.position + (c_limbs [i].transform.TransformDirection (Vector3.right) * offset); c_limbs [i].transform.position = pos; } if (offset < current_offset) { Vector3 pos = c_limbs [i].transform.position - (c_limbs [i].transform.TransformDirection (Vector3.right) * offset); c_limbs [i].transform.position = pos; } } Debug.Log ("SetOffset() offset: " + offset.ToString () + " _current_spread: " + _current_spread.ToString() + " localPos: " + c_limbs[1].transform.localPosition); current_offset = offset; } }  
    • By Paul Bto
      State of the search
      I'm not looking / I'm still looking
      It is a third person game where you join more players to fight against bosses in the PvE style of the mmorpg, looking for online companions to help you defeat your enemy, where everyone does their respective work, such as tank, dps or heal, in groups of 5 or 10 players.
      The difference with other mmorpg is that here, since there is no open world, you will not have to go through it doing hundreds of missions and using many hours to get to the content that really interests you, which, in the case of PvE, are the dungeons and raids.
      Objectives of the project
      For this project we will first make a demo that will only be the boss of a 5 player dungeon.
      The first phase of this project is to release a video gameplay for crowdfunding (kickstarter maybe), so we will focus more on the visual, which could be falsified if the mechanics do not work. If the crowdfunding gives green light would be the game, which if in turn would work would be updated in the future with more dungeons and bands. The ideal would be to make a playable demo.
      If this game generates benefits outside crowdfunding, the profits will be distributed depending on the work and disbursement of each one.
      Unreal. If you want to learn how to use this software with the project, you can.
      Required profiles
      - At the moment 2 Programmers (if has multiplayer knowledge in Unreal better).
      - 1 2D Artist to make the interface design and some concepts. You are going to try make interfaces like this:

      It would be ideal if you can devote at least 7 hours by week.
      It is a plus if you have experience with high level PvE content in some mmorpg.
      Team structure
      Pablo. Environment artist.
      Diego. Character artist.
      Waiting until the equipment is assembled.
      Additional Information
      I have the GDD, it depends on the work but, I think the demo could be done in 6 months.
      Here I leave some images of what I did of the project in Unity, where basically I have been testing (The project will be done in Unreal):

      Desired feedback
      If you think you can give me some advice about the project, feel free to give your opinion.
    • By Hellados
      Hello guys, my name is Giorgi and i'm newbie game developer i'm learning Pixel art and after pixel art  i want learn C# and don't know how and where start i'm bad with programming language and know only HTML/CSS
    • By NikiTo
      Recently I read that the APIs are faking some behaviors, giving to the user false impressions.
      I assume Shader Model 6 issues the wave instructions to the hardware for real, not faking them.

      Is Shader Model 6, mature enough? Can I expect the same level of optimization form Model 6 as from Model 5? Should I expect more bugs from 6 than 5?
      Would the extensions of the manufacturer provide better overall code than the Model 6, because, let say, they know their own hardware better?

      What would you prefer to use for your project- Shader Model 6 or GCN Shader Extensions for DirectX?

      Which of them is easier to set up and use in Visual Studio(practically)?
    • By N Drew
      I am working on a 2D SideScroller game in my own made game engine using SFML and C++.I am searching for 2D artists,especially pixel artist for making and animating characters,backgrounds and other props that can be made in any Drawing Program.The artist will become part of the team of Hammer Studios and he got a part of the Revenue Sharing.If you are interested send me a mail at:ghiurcutaandrei@gmail.com .If you are not an artist but you want to be a part of our Team,Soon we will be recruiting an C++ AI programmers that worked in SFML/OpenGL.
      We work together using Discord.

    • By Ike aka Dk
      Hello everyone 
      I am a programmer from Baku.
      I need a 3D Modeller for my shooter project in unity.I have 2 years Unity exp.
      Project will paid when we finish the work 
      If you interested write me on email:
  • Advertisement
  • Popular Now

  • Forum Statistics

    • Total Topics
    • Total Posts

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!