Jump to content
  • Advertisement
Sign in to follow this  
Mr_Fox

DX12 UAV Counters

This topic is 864 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In DX12, the CreateUnorderedAccessView now accept a new optional ID3D12Resource param which is called counter resource, and MSDN has a page briefly mentioned about this UAV Counters. But the online resource about this is very sparse. So it will be greatly appreciated if someone could elaborate on that (what is UAV counter, in which scenario we hope to associate a counter buffer to a UAV, what render technique or GPGPU algorithm will benefit most from using UAV counters, and in terms of performance, how is UAV counters compare to its alternatives)

 

Sorry for asking a lot of questions, but this really bothers me

 

Thanks

 

Peng

Share this post


Link to post
Share on other sites
Advertisement

I've asked this before and am similarly bothered.

 

I'm sort of assuming that vendors can optionally implement some fancy technique to speed them up -- for example, a card with fast shared memory atomics might use a different implementation than a card that doesn't. If you're writing HLSL shaders and don't know which card it's going to run on, you'd need to write two different shaders and switch based on the gpu at runtime which sucks. 

 

But it's still kind of odd because one might want different techniques depending on the usage case -- e.g. if every thread increments a counter vs very few vs multiple varying increments per thread, and it seems like this can't possibly address every situation. 

 

Anyway having used them before I learned a few things:

  • You can use the same buffer as a counter buffer for the UAV buffer
  • Counters have a really big alignment, 4096 bytes. Putting other stuff within the 4092 bytes seems fine. Also the counter buffer can be 4 bytes big if you want. IDGI.
  • Putting the counter at the very beginning of a buffer works and is kind of convenient
  • The counter value can be accessed like any other data
  • It ran about comparably as fast as InterlockedAdding but again it might be doing some optimization on other cards that would be better than a global interlocked add.

 

Most of this is speculation.

 

---

 

To compare to other techniques, you can use a scan algorithm or histopyramid to do a lot of tasks that counters do. Mainly compaction of sparse data in buffers, filtering, or even just counting occurrences of something. Off the top of my head, marching cubes can use any of these three techniques (counters, scan, histop) to list occupied voxels in a contiguous array.

Edited by Dingleberry

Share this post


Link to post
Share on other sites

I've asked this before and am similarly bothered.

 

I'm sort of assuming that vendors can optionally implement some fancy technique to speed them up -- for example, a card with fast shared memory atomics might use a different implementation than a card that doesn't. If you're writing HLSL shaders and don't know which card it's going to run on, you'd need to write two different shaders and switch based on the gpu at runtime which sucks. 

 

But it's still kind of odd because one might want different techniques depending on the usage case -- e.g. if every thread increments a counter vs very few vs multiple varying increments per thread, and it seems like this can't possibly address every situation. 

 

Anyway having used them before I learned a few things:

  • You can use the same buffer as a counter buffer for the UAV buffer
  • Counters have a really big alignment, 4096 bytes. Putting other stuff within the 4092 bytes seems fine. Also the counter buffer can be 4 bytes big if you want. IDGI.
  • Putting the counter at the very beginning of a buffer works and is kind of convenient
  • The counter value can be accessed like any other data
  • It ran about comparably as fast as InterlockedAdding but again it might be doing some optimization on other cards that would be better than a global interlocked add.

 

Most of this is speculation.

 

---

 

To compare to other techniques, you can use a scan algorithm or histopyramid to do a lot of tasks that counters do. Mainly compaction of sparse data in buffers, filtering, or even just counting occurrences of something. Off the top of my head, marching cubes can use any of these three techniques (counters, scan, histop) to list occupied voxels in a contiguous array.

Thanks Dingleberry for share your experience and thought on counter buffer. I think the thing I am curious most is the design purpose of counter buffer, since we can to counting or similar things with just buffer and atomic ops, so why directx have specific uva counter stuff around its api level. There must be some cases where normal uav and atomic ops cannot do the job.... 

Share this post


Link to post
Share on other sites

Another poster on here told me some drivers will just implement it as an interlocked add. I'm nearly certain an interlocked add can functionally do everything a counter can. I'm guessing it just sometimes isn't done that way, like Microsoft told venders "the counter has these requirements but doesn't need to work in any specific way". 

 

Thinking it through further, I don't think many people use them so it creates a chicken and egg problem where vendors aren't going to care about improving its performance and then no one uses them because they're not significantly faster than atomics.

 

Again I could be really wrong, since gpus can do a lot of things that aren't exposed directly to DX12.

Edited by Dingleberry

Share this post


Link to post
Share on other sites
AFAIK it's equivalent to Dx11's append/consume buffers, which had a magic hidden counter. This is the same, but it's no longer hidden.
Some GPU's might have special hardware for them, but most modern ones likely use general purpose hardware to implement them like you've guessed.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!