# What may cause atomic operations to behave non-atomically (?) in a pixel shader? [solved]

## Recommended Posts

Posted (edited)
Hi,

I am working in Unity, trying to create depth-buffer-like functionality using atomics on a UAV in a pixel shader.

I find though that it does not behave as expected. It appears as if the InterlockedMin call is not behaving atomically.

I say appears, because all I can see is that the conditional based on the original memory value returned by InterlockedMin does not behave correctly. Whatever causes incorrect values to be returned from InterlockedMin also occurs in the frame debugger - Unity's and RenderDoc - so when debugging a pixel it changes from under me!

By changing this conditional I can see that InterlockedMin is not returning random data. It returns values that the memory feasibly would contain, just not what should be the minimum.

Here is a video showing what I mean: https://vid.me/tUP8
Here is a video showing the same behaviour for a single capture in RenderDoc: https://vid.me/4Fir

(In that video the pixel shader is trying to use InterlockedMin to draw only the fragments with the lowest vertex colours encountered so far, and discard all others.)

Things I have tried:

• Different creation flags for ComputeBuffer (though since its Unity the options are limited and opaque)

• Using a RenderTexture instead of a ComputeBuffer

• Using the globallycoherent prefix

• Clearing the buffer in the pixel shader then syncing with a DeviceMemoryBarrier() call

• Clearing the buffer in the pixel shader every other frame with a CPU set flag

• Using a different atomic (InterlockedMax())

• Using a different slot and/or binding calls

Here is the minimum working example that created those videos: https://www.dropbox.com/s/3z2g85vcqw75d1a/Atomics%20Bug%20Minimum%20Working%20Example.zip?dl=0

I can't think of what else to try, I don't see how the issue could be anything other than the InterlockedMin call, and I don't see what else in my code could affect it...

Below is the relevant fragment shader:

float4 frag (v2f i) : SV_Target
{
// sample the texture
float4 col = i.colour;

float c_norm = clamp(col.x, 0, 1);    //one triangle is <=0 and the other is >=1

uint d_uint = (uint)c_norm;
uint d_uint_original = 0;

uint2 upos = i.screenpos * screenparams;
uint offset = (upos.y * screenparams.x) + upos.x;

InterlockedMin(depth[offset], d_uint, d_uint_original);
if (d_uint > d_uint_original)
{
clip(-1);    //we havent updated the depth buffer (or at least shouldnt have) so don't write the pixel
}

return col;
}

With the declaration of the buffer being:

RWStructuredBuffer<uint> depth : register (u1);

And here is how the buffer is being bound and used:

// Use this for initialization
void Start () {
int length = Camera.main.pixelWidth * Camera.main.pixelHeight;
depthbufferdata = new uint[length];
for(int i = 0; i < length; i++)
{
depthbufferdata[i] = 0xFFFFFFFF;
}
depthbuffer = new ComputeBuffer(length, sizeof(uint));
}

// Update is called once per frame
void OnRenderObject () {
depthbuffer.SetData(depthbufferdata); // clears the mask. in my actual project this is done with a compute shader.

Graphics.SetRandomWriteTarget(1, depthbuffer);

material.SetVector("screenparams", Camera.main.pixelRect.size);
material.SetPass(0);
Graphics.DrawMeshNow(mesh, transform.localToWorldMatrix);
}
Sj
Edited by sebjf

##### Share on other sites
Is your code so large it cannot be included in your post? The iPad I'm using right now isn't set up to see it, and I don't care enough to help that I'm going to change that. And I also prefer someone looking at this a year from now will be able to see the relevant details even if the file is removed from Dropbox or whatever.

My blind guess would be you copied data to a local variable, and are just atomically operating on that local variable. But the odds of that blind guess being right aren't great.

##### Share on other sites

@richardurich I originally tried to upload it to the forum but kept receiving errors. I've added the relevant code though, as actually as you say its not too long. I can't see anything I could change in it though - e.g. calling InterlockedMin like a method as one would with a RWByteAddress type just results in a compiler error.

##### Share on other sites
Posted (edited)

float c_norm = clamp(col.x, 0, 1);
d_uint = (uint)c_norm; 

Does that even work? Maybe you want to multiply by 8,388,608 (= 223) or a similarly large constant first? Maybe try a smaller constant, in case your GPU does not have full IEEE 754 precision.... 100,000 or such.

Cast from float to int rounds towards zero in HSLS if I'm not mistaken, and you are clamping to between 0.0 and 1.0.

Thus, unless col.x == 1.0 this is in my understanding equivalent to d_uint = 0.

Edited by samoth

##### Share on other sites

Hi samoth,

Yes it does, in this narrow case anyway - usually I use asuint() or as you say multiply by a large number then cast. Above I did a direct cast because it was easy to see which triangle wrote each value when checking the memory in RenderDoc. (I've tried all sorts of casts and scales to see if that was causing this issue however and none have an effect.)

##### Share on other sites

But how can that work?

Since clamp(whatever, 0, 1) will always be between 0.0 and 1.0, d_uint is always zero except when the input is exactly 1.0 (if I recall incorrectly about rounding, the value 1 will be more frequent, but still the only possible values for d_uint are 0 and 1).

So your depth buffer has only two possible values? That doesn't look like it can be right.

##### Share on other sites
Posted (edited)

The colour values are always  <=0 or >=1, I make sure of that by setting it when I made the test data. (I also check it in RenderDoc). Though currenly the shader is written as

float c_norm = clamp(col.x, 0, 1) * 100;
uint d_uint = (uint)c_norm;
uint d_uint_original = 0;
just to be sure.

I am using the colour values to make this minimal example as they are easy to control. In my real project the masked value is more complex, but as can be seen bug occurs even with something as simple as vertex colours.

So your depth buffer has only two possible values? That doesn't look like it can be right.

Yes that's right - it has three possible values: 0, 1 (from the fragments) or 0xFFFFFFFF, which is the initialisation value. I have confirmed this is the case using the conditional as well. Thats why I suspect its a timing issue rather than, say, reading the wrong part of memory or not binding anything, even though I can't fully trust the debugger.

This is meant to be the absolute simplest case I can come up with that still shows the issue.

Edited by sebjf

##### Share on other sites
If one is encountered before zero at the interlockedmin statement, both the zero and one successfully pass your clip test, but you do not know which order they will return col.

I don't know if that's the only issue, but it sure seems like something you didn't intend.

##### Share on other sites

Hi richardurich,

I think thats it.

I first put a DeviceMemoryBarrier() call between the InterlockedMin() and a re-read of the value. That didn't work. Though it may be to do with clip() (I recall reading about the effect this had on UAVs and will see if I can find the docs).

Then, I removed the test entirely and wrote a second shader to draw the contents of the depth buffer - and that appears to be very stable.

I will see if I can get it to work for a test in a single shader. Though I could probably refactor my project to just use a single uav, which would be more efficient.

Thank you very much for your insights. I have been working at that for 2 days!

Sj