# DX11 Compute shaders synchronization issue

## Recommended Posts

Hi

Do the atomic operations (InterlockedAdd in my case) should work without any issues on RWByteAddressBuffer and be globaly coherent ?
I'v come back from CUDA world and commited fairly simple kernel that does some job, the pseudo-code is as follows:

(both kernels use that same RWByteAddressBuffer)

first kernel does some job and sets Result[0] = 0;
(using Result.Store(0, 0))

I'v checked with debugger, and indeed the value stored at dword 0 is 0

now my second kernel

RWByteAddressBuffer Result;

void main()
{
for (int i = 0; i < 5; i++)
{
uint4 v0 = DoSomeCalculations1();
uint4 v1 = DoSomeCalculations2();
uint4 v2 = DoSomeCalculations3();

if (v0.w == 0 && v1.w == 0 && v2.w)
continue;

//    increment counter by 3, and get it previous value
// this should basically allocate space for 3 uint4 values in buffer
uint prev;

// this fills the buffer with 3 uint4 values (+1 is here as the first 16 bytes is occupied by DrawInstancedIndirect data)
Result.Store4((prev+0+1)*16, v0);
Result.Store4((prev+1+1)*16, v1);
Result.Store4((prev+2+1)*16, v2);
}
}

Now I invoke it with Dispatch(4,4,4)

Now I use DrawInstancedIndirect to draw the buffer, but ocassionaly there is missed triangle here and there for a frame, as if the atomic counter does not work as expected
do I need any additional synchronization there ?
I'v tried 'AllMemoryBarrierWithGroupSync' at the end of kernel, but without effect.
If I do not use atomic counter, and istead just output empty vertices (that will transform into degenerated triangles) the all is OK - as if I'm missing some form of synchronization, but I do not see such a thing in DX11.
I'v tested on both old and new nvidia hardware (680M and 1080, the behaviour is that same).

##### Share on other sites

I'v finally found why atomic operations DOES NOT WORK on NVIDIA hardware ... after going home and running my program on Radeon Vega ... well to my big suprise it worked as expected  .... no flickering on mesh.

So I'v worked a little more on it, and it turned out that having the indirect drawcall data at the begining of buffer is not the best idea, so I'v decoupled it to another buffer where more than one indirect draw data lives.

By doing this I'v removed D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS flag from buffer.

Then after some more work I'v checked this on nvidia ... and to my suprise it worked as expected ... First I thought that I'v fixed something else, and it's working now, but no - I'v set D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS to that buffer (that was only real difference on c++ side), and again, atomic operations do not work as expected, for buffer without D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS  all is fine.

Well ... I didn't found any note in documentation that I should not mix D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS and atomics operations in one buffer ... nvidia bug ?

##### Share on other sites

That sounds like a bug in Nvidia's driver. In D3D11 the results of UAV writes should be visible to all other pipeline stages after the Distpatch completes, regardless of flags and whether or not you've used atomic operations.

## Create an account

Register a new account

• 10
• 12
• 10
• 10
• 11
• ### Similar Content

• Hi, right now building my engine in visual studio involves a shader compiling step to build hlsl 5.0 shaders. I have a separate project which only includes shader sources and the compiler is the visual studio integrated fxc compiler. I like this method because on any PC that has visual studio installed, I can just download the solution from GitHub and everything just builds without additional dependencies and using the latest version of the compiler. I also like it because the shaders are included in the solution explorer and easy to browse, and double-click to open (opening files can be really a pain in the ass in visual studio run in admin mode). Also it's nice that VS displays the build output/errors in the output window.
Anyone with some experience in this?

• Hello!
Have a problem with reflection shader for D3D11:
1>engine_render_d3d11_system.obj : error LNK2001: unresolved external symbol IID_ID3D11ShaderReflection
#include <D3Dcompiler.h>
#include <D3DCompiler.inl>
#pragma comment(lib, "D3DCompiler.lib")
//#pragma comment(lib, "D3DCompiler_47.lib")
As MSDN tells me but still no fortune. I think lot of people did that already, what I missing?
where recommend to use SDK headers and libs before Wind SDK, but I am not using DirectX SDK for this project at all, should I?

• Hi there, this is my first post in what looks to be a very interesting forum.
I am using DirectXTK to put together my 2D game engine but would like to use the GPU depth buffer in order to avoid sorting back-to-front on the CPU and I think I also want to use GPU instancing, so can I do that with SpriteBatch or am I looking at implementing my own sprite rendering?

• I am trying to draw a screen-aligned quad with arbitrary sizes.

currently I just send 4 vertices to the vertex shader like so:
pDevCon->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP);
pDevCon->Draw(4, 0);

then in the vertex shader I am doing this:
float4 main(uint vI : SV_VERTEXID) : SV_POSITION
{
float2 texcoord = float2(vI & 1, vI >> 1);
return float4((texcoord.x - 0.5f) * 2, -(texcoord.y - 0.5f) * 2, 0, 1);
}
that gets me a screen-sized quad...ok .. what's the correct way to get arbitrary sizes?...I have messed around with various numbers, but I think I don't quite get something in these relationships.
one thing I tried is:

float4 quad = float4((texcoord.x - (xpos/screensizex)) * (width/screensizex), -(texcoord.y - (ypos/screensizey)) * (height/screensizey), 0, 1);

.. where xpos and ypos is number of pixels from upper right corner..width and height is the desired size of the quad in pixels
this gets me somewhat close, but not right.. a bit too small..so I'm missing something ..any ideas?

.
• By Stewie.G
Hi,
I've been trying to implement a gaussian blur recently, it would seem the best way to achieve this is by running a bur on one axis, then another blur on the other axis.
I think I have successfully implemented the blur part per axis, but now I have to blend both calls with a proper BlendState, at least I think this is where my problem is.
Here are my passes:
D3DX11_TECHNIQUE_DESC techDesc; mBlockEffect->mTech->GetDesc( &techDesc ); for(UINT p = 0; p < techDesc.Passes; ++p) { deviceContext->IASetVertexBuffers(0, 2, bufferPointers, stride, offset); deviceContext->IASetIndexBuffer(mIB, DXGI_FORMAT_R32_UINT, 0); mBlockEffect->mTech->GetPassByIndex(p)->Apply(0, deviceContext); deviceContext->DrawIndexedInstanced(36, mNumberOfActiveCubes, 0, 0, 0); } No blur

PS_BlurV

PS_BlurH

P0 + P1

As you can see, it does not work at all.
I think the issue is in my BlendState, but I am not sure.
I've seen many articles going with the render to texture approach, but I've also seen articles where both shaders were called in succession, and it worked just fine, I'd like to go with that second approach. Unfortunately, the code was in OpenGL where the syntax for running multiple passes is quite different (http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/). So I need some help doing the same in HLSL :-)

Thanks!