Jump to content
  • Advertisement
mark_braga

DX12 Performance implications of using RWStructuredBuffer for reading

This topic is 474 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a compute shader which writes to an RWStructuredBuffer and later a pixel shader uses it as SRV for reading (StructuredBuffer).

For this I put a barrier to transition the buffer to SRV and back to UAV after the pixel shader is done. I am trying to minimize number of barriers so one option is to just use RWStructuredBuffer even in the pixel shader even if the shader only reads from it.

So my question is, does RWStructuredBuffer come with a hidden cost for reading which is greater than the cost of the two barriers?

 

Share this post


Link to post
Share on other sites
Advertisement

Not that I know of, at least at a low-level on the hardware that I'm familiar with. But I don't think that this is a question that anyone could answer with any certainty, since it's very architecture-specific and driver-specific. It's entirely possibly that on some hardware reading through the UAV path could cause different cache protocols to be used, or that the driver skips some possible optimization that it can do if it knows the resource will only be read from and not written to.

Share this post


Link to post
Share on other sites

So would you say that the best practice is to do the transition or just assume that reading from UAV has the same cost as reading from SRV?

Thanks

Edited by mark_braga

Share this post


Link to post
Share on other sites

You still need a UAV->UAV barrier in between the two shaders to make sure that all the writes from the first have completed before the reads from the second one begin.

Share this post


Link to post
Share on other sites

Yes. I already have the memory barrier. The question is not about synchronization but about whether we need to transition the UAV to SRV for optimal read performance.

Thanks

Share this post


Link to post
Share on other sites

"Optimal read performance" is relative. It's not the same to read the data once and never read it again during the frame, than to read it over and over a thousand times.

Your mileage may vary.

The best approach you could do is to prepare your engine's code in such a way that toggling between strategies becomes easy.

Share this post


Link to post
Share on other sites

The general advice when dealing with resource binding, the least privilege the better. If you need a read only, don't bind it as read write, if you need it only in the pixel shader, don't bind it for the other stages, If you don't need a depth buffer, don't bind one, if you don't need have a mrtX, don't let a shader write it, etc.

Share this post


Link to post
Share on other sites
On 8/22/2017 at 8:34 AM, mark_braga said:

Yes. I already have the memory barrier. The question is not about synchronization but about whether we need to transition the UAV to SRV for optimal read performance.

Thanks

Ah I kinda misread. As above it's down to the HW, I'm not aware of any differences, but in theory, sure, a GPU might choose a different caching strategy for RO vs RW resources. Perhaps the path to RAM could be different between them, and they could be cached in different hardware (e.g. an RO cache with no write-back support, and a traditional RW cache with WB/WC).

So basically, I'm not aware of a perf difference, but there could be an impact (either good or bad) in theory...

So at the moment you have a choice between:
Use as UAV | UAV barrier | Use as UAV.
And:
Use as UAV | UAV->SRV transition | Use as SRV | SRV->UAV transition

And optionally after either of those, call DiscardResource if the data is regenerated next frame.

As far as I know (on HW that I'm familiar with, which is mostly AMD...), the UAV barrier and the UAV->SRV transition both do the same thing here, and the SRV->UAV transition is basically a no-op... so both methods should be pretty much the same in perf.

edit: actually the SRV->UAV transition may actually perform a cache invalidation, so yeah, is slightly more driver work than just leaving it in the UAV state the whole time.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!