Hey Guys,
Think about the following case:
You have multiple dispatch calls with compute shader which launches tons of threads, and each thread will do an atomic add on the same memory location (same address for all threads from all dispatch calls). Since DX12 allow dispatch to be overlapped, all related document will suggest to put a uav barrier between each dispatch since they all operate on same buffer, and has data dependency.
I can understand the reasoning for normal read/write, since you read/write will be cached in multiple level, and to get correct result, you need to flush your cache, and do bunch of related cache invalidation. So that's why we need uav barrier between dispatches.(is my understanding correct?)
But what if all my write is atomic as described in the above case? In my experience, atomic write will be immediate visible to all other threads from the same dispatch even if threads are in different execution unit (I am not super sure about that, since no doc specifically state that....), which, to me, means all cached data is synced across entire GPU. And if that is the case, I think it will be safe to remove uav barrier between dispatches since data cache has already been taken care, there is no out-of-date data.
And from my experiment, it seems to work correctly on my GTX680m without the barrier. But there is no official document out there can confirm it, so I have no idea whether it is officially safe to do it, or it will cause undefined behavior, or corrupted data, and I am just accidentally get the correct result.
Please correct me if my assumption is wrong, and it will be great if someone could explain how atomic operation works in GPU (especially for those support reading back the original data, since it seems atomic read/write bypass all caches)
Thanks in advance