Does 32bit RWTexture3D<uint> (or any 32bit RWTexture<uint> object) support atomic ops?

Started by
2 comments, last by Mr_Fox 7 years, 2 months ago

Hey Guys,

Does RWTexture3D<uint> (32bit) support atomic ops? MSDN page didn't list any InterlockedXXX as RWTexture3D's functions, however shader compiler is ok with


RWTexture3D<uint> tex_uavFuseBlockVol : register(u2);
...
...
[numthreads(THREAD_DIM, THREAD_DIM, THREAD_DIM)]
void main(uint3 u3GID : SV_GroupID, uint3 u3GTID : SV_GroupThreadID,
    uint uGIdx : SV_GroupIndex)
{
    ....
    ....
    InterlockedAdd(tex_uavFuseBlockVol[u3BlockIdx], 1, uOrig);
    ...
    ...
    if (uGIdx == 0 && uThreadGroupIdxInBlock == 0) {
        InterlockedAnd(
            tex_uavFuseBlockVol[u3BlockIdx], ~BLOCKSTATEMASK_UPDATE);
    }
}
...
...

No error, no warning

So I didn't pay much attention to it... However recently I found a bug which seems point directly to the atomicity of operation on that Texture3D. So I checked the docs, as mentioned before, it didn't say RWTexture3D support atomic ops. However it state that :

You can prefix RWTexture3D objects with the storage class globallycoherent. This storage class causes memory barriers and syncs to flush data across the entire GPU such that other groups can see writes. Without this specifier, a memory barrier or sync will flush a UAV only within the current group.

I was wondering whether the keywoard 'globallycoherent' will fix my bug, but it didn't, so I get a little bit confused about what this keyword actually does, and in what kind of case should we use it.

Also my shader suggests that for RWTexture3D<uint>(32bit) InterlockedAdd is working properly (maybe I just lucky?) however InterlockedAnd is not...

Any idea, comments, or suggestions are greatly appreciated.

Thanks

Advertisement

Atomic ops should all be fine on an R32_UINT texture.

How have you verified that the InterlockedAnd isn't working?

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Atomic ops should all be fine on an R32_UINT texture.

How have you verified that the InterlockedAnd isn't working?

If that's the case I may need to dig more into my algorithm... it may be on my side though:

during end of my frame I have a cs will do bunch of computation and at end will rest one bit of that volume (I do have other interlocked inst in the same shader to the same volume without barriers, but it should be ok right? also this need atomic op across different threadgroups though...), so at the beginning of next frame, that bit should be 0 for all voxels, but it's not. Then later I use a cs with UAV barriers after that pass to just do tex_uavFuseBlockVol[u3BlockIdx] &= ~Flag; And the bug disappear... beside, the MSDN page didn't list atomic ops so I want to first check with you guys before I dig more into my implementation...

Also how to use the globallycoherent keyword? And when we need that?

Thanks

How have you verified that the InterlockedAnd isn't working?

It turned out the problem is caused by not turning debuglayer on.... please see my new post on that

Thanks

This topic is closed to new replies.

Advertisement