• Advertisement
Sign in to follow this  

DX11 Compute shaders synchronization issue

Recommended Posts


Just a simple question about compute shaders (CS5, DX11).
Do the atomic operations (InterlockedAdd in my case) should work without any issues on RWByteAddressBuffer and be globaly coherent ?
I'v come back from CUDA world and commited fairly simple kernel that does some job, the pseudo-code is as follows:

(both kernels use that same RWByteAddressBuffer)

first kernel does some job and sets Result[0] = 0;
(using Result.Store(0, 0))

I'v checked with debugger, and indeed the value stored at dword 0 is 0 ;)

now my second kernel

RWByteAddressBuffer Result; 

[numthreads(8, 8, 8)]
void main()
    for (int i = 0; i < 5; i++)
        uint4 v0 = DoSomeCalculations1();
        uint4 v1 = DoSomeCalculations2();
        uint4 v2 = DoSomeCalculations3();
        if (v0.w == 0 && v1.w == 0 && v2.w)

        //    increment counter by 3, and get it previous value
        // this should basically allocate space for 3 uint4 values in buffer
        uint prev;
        Result.InterlockedAdd(0, 3, prev);
        // this fills the buffer with 3 uint4 values (+1 is here as the first 16 bytes is occupied by DrawInstancedIndirect data)
        Result.Store4((prev+0+1)*16, v0);
        Result.Store4((prev+1+1)*16, v1);
        Result.Store4((prev+2+1)*16, v2);

Now I invoke it with Dispatch(4,4,4)

Now I use DrawInstancedIndirect to draw the buffer, but ocassionaly there is missed triangle here and there for a frame, as if the atomic counter does not work as expected :/
do I need any additional synchronization there ?
I'v tried 'AllMemoryBarrierWithGroupSync' at the end of kernel, but without effect.
If I do not use atomic counter, and istead just output empty vertices (that will transform into degenerated triangles) the all is OK - as if I'm missing some form of synchronization, but I do not see such a thing in DX11.
I'v tested on both old and new nvidia hardware (680M and 1080, the behaviour is that same).

Share this post

Link to post
Share on other sites

I'v finally found why atomic operations DOES NOT WORK on NVIDIA hardware ... after going home and running my program on Radeon Vega ... well to my big suprise it worked as expected  .... no flickering on mesh.

So I'v worked a little more on it, and it turned out that having the indirect drawcall data at the begining of buffer is not the best idea, so I'v decoupled it to another buffer where more than one indirect draw data lives. 

By doing this I'v removed D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS flag from buffer.

Then after some more work I'v checked this on nvidia ... and to my suprise it worked as expected ... First I thought that I'v fixed something else, and it's working now, but no - I'v set D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS to that buffer (that was only real difference on c++ side), and again, atomic operations do not work as expected, for buffer without D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS  all is fine.

Well ... I didn't found any note in documentation that I should not mix D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS and atomics operations in one buffer ... nvidia bug ? 


Share this post

Link to post
Share on other sites

That sounds like a bug in Nvidia's driver. In D3D11 the results of UAV writes should be visible to all other pipeline stages after the Distpatch completes, regardless of flags and whether or not you've used atomic operations. 

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By turanszkij
      Hi, right now building my engine in visual studio involves a shader compiling step to build hlsl 5.0 shaders. I have a separate project which only includes shader sources and the compiler is the visual studio integrated fxc compiler. I like this method because on any PC that has visual studio installed, I can just download the solution from GitHub and everything just builds without additional dependencies and using the latest version of the compiler. I also like it because the shaders are included in the solution explorer and easy to browse, and double-click to open (opening files can be really a pain in the ass in visual studio run in admin mode). Also it's nice that VS displays the build output/errors in the output window.
      But now I have the HLSL 6 compiler and want to build hlsl 6 shaders as well (and as I understand I can also compile vulkan compatible shaders with it later). Any idea how to do this nicely? I want only a single project containing shader sources, like it is now, but build them for different targets. I guess adding different building projects would be the way to go that reference the shader source project? But how would they differentiate from shader type of the sources (eg. pixel shader, compute shader,etc.)? Now the shader building project contains for each shader the shader type, how can other building projects reference that?
      Anyone with some experience in this?
    • By osiris_dev
      Have a problem with reflection shader for D3D11:
      1>engine_render_d3d11_system.obj : error LNK2001: unresolved external symbol IID_ID3D11ShaderReflection
      I tried to add this:
      #include <D3D11Shader.h>
      #include <D3Dcompiler.h>
      #include <D3DCompiler.inl>
      #pragma comment(lib, "D3DCompiler.lib")
      //#pragma comment(lib, "D3DCompiler_47.lib")
      As MSDN tells me but still no fortune. I think lot of people did that already, what I missing?
      I also find this article http://mattfife.com/?p=470
      where recommend to use SDK headers and libs before Wind SDK, but I am not using DirectX SDK for this project at all, should I?
    • By trojanfoe
      Hi there, this is my first post in what looks to be a very interesting forum.
      I am using DirectXTK to put together my 2D game engine but would like to use the GPU depth buffer in order to avoid sorting back-to-front on the CPU and I think I also want to use GPU instancing, so can I do that with SpriteBatch or am I looking at implementing my own sprite rendering?
      Thanks in advance!
    • By Matt_Aufderheide
      I am trying to draw a screen-aligned quad with arbitrary sizes.
      currently I just send 4 vertices to the vertex shader like so:
      pDevCon->Draw(4, 0);
      then in the vertex shader I am doing this:
      float4 main(uint vI : SV_VERTEXID) : SV_POSITION
       float2 texcoord = float2(vI & 1, vI >> 1);
      return float4((texcoord.x - 0.5f) * 2, -(texcoord.y - 0.5f) * 2, 0, 1);
      that gets me a screen-sized quad...ok .. what's the correct way to get arbitrary sizes?...I have messed around with various numbers, but I think I don't quite get something in these relationships.
      one thing I tried is: 
      float4 quad = float4((texcoord.x - (xpos/screensizex)) * (width/screensizex), -(texcoord.y - (ypos/screensizey)) * (height/screensizey), 0, 1);
      .. where xpos and ypos is number of pixels from upper right corner..width and height is the desired size of the quad in pixels
      this gets me somewhat close, but not right.. a bit too small..so I'm missing something ..any ideas?
    • By Stewie.G
      I've been trying to implement a gaussian blur recently, it would seem the best way to achieve this is by running a bur on one axis, then another blur on the other axis.
      I think I have successfully implemented the blur part per axis, but now I have to blend both calls with a proper BlendState, at least I think this is where my problem is.
      Here are my passes:
      RasterizerState DisableCulling { CullMode = BACK; }; BlendState AdditiveBlend { BlendEnable[0] = TRUE; BlendEnable[1] = TRUE; SrcBlend[0] = SRC_COLOR; BlendOp[0] = ADD; BlendOp[1] = ADD; SrcBlend[1] = SRC_COLOR; }; technique11 BlockTech { pass P0 { SetVertexShader(CompileShader(vs_5_0, VS())); SetGeometryShader(NULL); SetPixelShader(CompileShader(ps_5_0, PS_BlurV())); SetRasterizerState(DisableCulling); SetBlendState(AdditiveBlend, float4(0.0, 0.0, 0.0, 0.0), 0xffffffff); } pass P1 { SetVertexShader(CompileShader(vs_5_0, VS())); SetGeometryShader(NULL); SetPixelShader(CompileShader(ps_5_0, PS_BlurH())); SetRasterizerState(DisableCulling); } }  
      D3DX11_TECHNIQUE_DESC techDesc; mBlockEffect->mTech->GetDesc( &techDesc ); for(UINT p = 0; p < techDesc.Passes; ++p) { deviceContext->IASetVertexBuffers(0, 2, bufferPointers, stride, offset); deviceContext->IASetIndexBuffer(mIB, DXGI_FORMAT_R32_UINT, 0); mBlockEffect->mTech->GetPassByIndex(p)->Apply(0, deviceContext); deviceContext->DrawIndexedInstanced(36, mNumberOfActiveCubes, 0, 0, 0); } No blur



      P0 + P1

      As you can see, it does not work at all.
      I think the issue is in my BlendState, but I am not sure.
      I've seen many articles going with the render to texture approach, but I've also seen articles where both shaders were called in succession, and it worked just fine, I'd like to go with that second approach. Unfortunately, the code was in OpenGL where the syntax for running multiple passes is quite different (http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/). So I need some help doing the same in HLSL :-)
  • Advertisement