# DX11 Compute shaders synchronization issue

## Recommended Posts

Hi

Do the atomic operations (InterlockedAdd in my case) should work without any issues on RWByteAddressBuffer and be globaly coherent ?
I'v come back from CUDA world and commited fairly simple kernel that does some job, the pseudo-code is as follows:

(both kernels use that same RWByteAddressBuffer)

first kernel does some job and sets Result[0] = 0;
(using Result.Store(0, 0))

I'v checked with debugger, and indeed the value stored at dword 0 is 0

now my second kernel

RWByteAddressBuffer Result;

void main()
{
for (int i = 0; i < 5; i++)
{
uint4 v0 = DoSomeCalculations1();
uint4 v1 = DoSomeCalculations2();
uint4 v2 = DoSomeCalculations3();

if (v0.w == 0 && v1.w == 0 && v2.w)
continue;

//    increment counter by 3, and get it previous value
// this should basically allocate space for 3 uint4 values in buffer
uint prev;

// this fills the buffer with 3 uint4 values (+1 is here as the first 16 bytes is occupied by DrawInstancedIndirect data)
Result.Store4((prev+0+1)*16, v0);
Result.Store4((prev+1+1)*16, v1);
Result.Store4((prev+2+1)*16, v2);
}
}

Now I invoke it with Dispatch(4,4,4)

Now I use DrawInstancedIndirect to draw the buffer, but ocassionaly there is missed triangle here and there for a frame, as if the atomic counter does not work as expected
do I need any additional synchronization there ?
I'v tried 'AllMemoryBarrierWithGroupSync' at the end of kernel, but without effect.
If I do not use atomic counter, and istead just output empty vertices (that will transform into degenerated triangles) the all is OK - as if I'm missing some form of synchronization, but I do not see such a thing in DX11.
I'v tested on both old and new nvidia hardware (680M and 1080, the behaviour is that same).

##### Share on other sites

I'v finally found why atomic operations DOES NOT WORK on NVIDIA hardware ... after going home and running my program on Radeon Vega ... well to my big suprise it worked as expected  .... no flickering on mesh.

So I'v worked a little more on it, and it turned out that having the indirect drawcall data at the begining of buffer is not the best idea, so I'v decoupled it to another buffer where more than one indirect draw data lives.

By doing this I'v removed D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS flag from buffer.

Then after some more work I'v checked this on nvidia ... and to my suprise it worked as expected ... First I thought that I'v fixed something else, and it's working now, but no - I'v set D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS to that buffer (that was only real difference on c++ side), and again, atomic operations do not work as expected, for buffer without D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS  all is fine.

Well ... I didn't found any note in documentation that I should not mix D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS and atomics operations in one buffer ... nvidia bug ?

##### Share on other sites

That sounds like a bug in Nvidia's driver. In D3D11 the results of UAV writes should be visible to all other pipeline stages after the Distpatch completes, regardless of flags and whether or not you've used atomic operations.

## Create an account

Register a new account

1. 1
2. 2
Rutin
21
3. 3
JoeJ
18
4. 4
5. 5

• 14
• 39
• 23
• 13
• 13
• ### Forum Statistics

• Total Topics
631719
• Total Posts
3001882
×