RWBuffer vs RWStructuredBuffer or RWByteAddressBuffer

Graphics and GPU Programming Programming

Started by Mr_Fox April 18, 2016 06:56 AM

5 comments, last by Mr_Fox 8 years ago

Mr_Fox

806

Author

April 18, 2016 06:56 AM

Hey Guys,

Recently, I have looked through some piece of shader code, and found out one piece of shader resource object I haven't really looked into it. That is RWBuffer. I've been using RWStructuredBuffer, RWTextureXD, and even RWByteAddressBuffer for awhile, and think there is nothing I can't do with those (maybe I am wrong?). So my question is: what is the different between RWBuffer and RWStructuredBuffer or RWByteAddressBuffer? in which scenario RWBuffer is preferred than others? and what's the pros and cons of using RWBuffer.

The MSDN description on these objects are almost identical, which is useless....

Thanks in advance.

Peng

Adam Miles

3,468

April 18, 2016 11:41 AM

The big pro for RWBuffer is the fact that it's the one that supports automatic hardware decompression and compression on reads and writes.

This feature is more often used with the normal SRV 'Buffer' type in HLSL where the vast majority of the DXGI Formats can be read and decompressed in the same way that compressed vertex attributes are decompressed on the fly. If the data you need to read needs only 8, 10, or 16 bits per channel then there's a good selection of formats that hardware can decompress and provide to the shader in the expanded 32 bit uint/float format.

The reason it's less often used with the 'RW' prefix is that this then becomes a UAV rather than an SRV. Up to D3D11.2 the only formats supported for RWBuffer were single channel R32 formats (FLOAT, UINT and SINT). All other formats would fail at runtime or shader compile time with an error about not supporting Typed UAV Loads. There was a workaround that meant it was possible to overlap an R32_UINT UAV over the top of other formats that were also 32bits in size but actually had multiple channels of varying bit depths. This meant you could do UAV loads on some extra formats that wouldn't otherwise be possible. This is explained here.

D3D11.3 however added optional support for Typed UAV Loads in the same way that D3D12 now does. There's a list of around 15 extra formats that can be used as the format for a UAV and support Loads in that format. The hardware can then optionally support a longer series of individual formats on a format by format basis. Typed UAV Loads are explained here.

In summary,

RWStructuredBuffer is a UAV around structured data. That is data made up of 32-bit uints and floats laid out into structures with a variable number of channels per member. No format decompression or compression is supported by the hardware. If you want two 16 bit uints, you'll have to pack and unpack them yourself from a single uint. The format of the UAV is DXGI_FORMAT_UNKNOWN because the format is implied by the structure layout written in HLSL.

RWBuffer is a UAV around an array of formatted (potentially compressed) data. There's no 'structure' to this data, it's just an array of one data type with the format specified by the UAV at UAV creation time. On newer hardware you can Load/Store to this buffer and the hardware will compress down to the UAV's format and decompress it again when you read it.

RWByteAddressBuffer you seem to be familiar with, but is just a "bag of bits", a bit like RWStructuredBuffer<uint> in many ways (since ByteAddressBuffers can only be loaded from at 4 byte alignment). I believe the difference between RWStructuredBuffer<uint> and RWByteAddressBuffer stems from the fact that D3D11_RESOURCE_MISC_BUFFER_STRUCTURED cannot be used with many BIND_FLAGS such as Vertex Buffer and Index Buffer. Whereas you can create a Vertex Buffer or Index Buffer that specifies the D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS flag.

As hardware matures and becomes even more general purpose there'll be less need for so many Buffer types, bind flags and misc flags. Every format will be able to be automatically decompressed and compressed and it won't matter whether it's a Constant Buffer, Vertex Buffer or Index Buffer, everything will "just work". For now though there's some not-always-obvious restrictions around how different buffers can be viewed and what operations are supported on what Buffer.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Dingleberry

924

April 18, 2016 06:23 PM

I'm so glad typed uav loads are here. How fast is the [de]compression? On a scale from "free" to "as long as it would take to bit shift stuff and convert manually"?

Adam Miles

3,468

April 18, 2016 07:02 PM

I can only speak to how it works on GCN. But very free. it's almost identical to how compressed vertex attributes are decompressed. The only difference being that in the case of UAVs the format is embedded in the descriptor whereas for vertex attributes the type of each attribute is patched in directly to the Vertex Shader at compile time.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Mr_Fox

806

Author

April 18, 2016 08:33 PM

The big pro for RWBuffer is the fact that it's the one that supports automatic hardware decompression and compression on reads and writes.

This feature is more often used with the normal SRV 'Buffer' type in HLSL where the vast majority of the DXGI Formats can be read and decompressed in the same way that compressed vertex attributes are decompressed on the fly. If the data you need to read needs only 8, 10, or 16 bits per channel then there's a good selection of formats that hardware can decompress and provide to the shader in the expanded 32 bit uint/float format.

The reason it's less often used with the 'RW' prefix is that this then becomes a UAV rather than an SRV. Up to D3D11.2 the only formats supported for RWBuffer were single channel R32 formats (FLOAT, UINT and SINT). All other formats would fail at runtime or shader compile time with an error about not supporting Typed UAV Loads. There was a workaround that meant it was possible to overlap an R32_UINT UAV over the top of other formats that were also 32bits in size but actually had multiple channels of varying bit depths. This meant you could do UAV loads on some extra formats that wouldn't otherwise be possible. This is explained here.

D3D11.3 however added optional support for Typed UAV Loads in the same way that D3D12 now does. There's a list of around 15 extra formats that can be used as the format for a UAV and support Loads in that format. The hardware can then optionally support a longer series of individual formats on a format by format basis. Typed UAV Loads are explained here.

In summary,

RWStructuredBuffer is a UAV around structured data. That is data made up of 32-bit uints and floats laid out into structures with a variable number of channels per member. No format decompression or compression is supported by the hardware. If you want two 16 bit uints, you'll have to pack and unpack them yourself from a single uint. The format of the UAV is DXGI_FORMAT_UNKNOWN because the format is implied by the structure layout written in HLSL.

RWBuffer is a UAV around an array of formatted (potentially compressed) data. There's no 'structure' to this data, it's just an array of one data type with the format specified by the UAV at UAV creation time. On newer hardware you can Load/Store to this buffer and the hardware will compress down to the UAV's format and decompress it again when you read it.

RWByteAddressBuffer you seem to be familiar with, but is just a "bag of bits", a bit like RWStructuredBuffer<uint> in many ways (since ByteAddressBuffers can only be loaded from at 4 byte alignment). I believe the difference between RWStructuredBuffer<uint> and RWByteAddressBuffer stems from the fact that D3D11_RESOURCE_MISC_BUFFER_STRUCTURED cannot be used with many BIND_FLAGS such as Vertex Buffer and Index Buffer. Whereas you can create a Vertex Buffer or Index Buffer that specifies the D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS flag.

As hardware matures and becomes even more general purpose there'll be less need for so many Buffer types, bind flags and misc flags. Every format will be able to be automatically decompressed and compressed and it won't matter whether it's a Constant Buffer, Vertex Buffer or Index Buffer, everything will "just work". For now though there's some not-always-obvious restrictions around how different buffers can be viewed and what operations are supported on what Buffer.

Thanks Adam for such detailed explanation, I really appreciated. But I just want to double check with you that I understand it correctly in terms of use cases:

For example, before d3d11.3 we may have to use RWStructuredBuffer<float4> buf in our shader, now with DX12 we could replace that with RWBuffer<float4> which should be more efficient since the later one uses hardware [de]compression which will save bandwidth.

So RWBuffer<MyDataStructure> is preferred than RWStructuredBuffer<MyDataStructure> if your 'MyDataStructure' is one of the supported formats, which means it should only have 1 or 4 channel of 8,16,32 bit data.

In summary, RWBuffer is a subset of RWStructureBuffer in terms of their use case.

Adam Miles

3,468

April 18, 2016 09:00 PM

For example, before d3d11.3 we may have to use RWStructuredBuffer<float4> buf in our shader, now with DX12 we could replace that with RWBuffer<float4> which should be more efficient since the later one uses hardware [de]compression which will save bandwidth.

Almost. Before 11.3 if you tried to read from RWBuffer<float> you would have got this error when you compiled the shader:

error X3676: typed UAV loads are only allowed for single-component 32-bit element types

On 11.3 and 12, this error has gone away, but you still need to make sure you're only using Typed UAV loads on hardware that reports support for the format that the underlying format of the UAV. 'float4' in HLSL could be R8G8B8A8_UNORM, R8G8B8A8_SNORM, R16G16B16A16_FLOAT, R16G16B16A16_UNORM, R32G32B32A32_FLOAT, and several others I expect. If the Feature Support says it can do loads on that format the it's fine.

It'll save bandwidth over using RWStructureBuffer<float4> so long as you don't pick R32G32B32A32_FLOAT. That format is already 128bits (4 * 32 bits) so there's nothing for it to decompress, it's already the correct format.

So RWBuffer<MyDataStructure> is preferred than RWStructuredBuffer<MyDataStructure> if your 'MyDataStructure' is one of the supported formats, which means it should only have 1 or 4 channel of 8,16,32 bit data.

Pretty much. It's always worth compressing your data if you can. There's some things you can only do on uncompressed data in a Structured Buffer (like Atomic operations; Add, Min, Max, Or etc). Of course if your data isn't 1-4 channels of data (float[1,2,3,4] or uint[1,2,3,4]) then you'll have to go back to using a Structured Buffer again and packing and unpacking the data manually (or using multiple RWBuffers).

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Mr_Fox

806

Author

April 18, 2016 09:12 PM

Thanks Adam, that's really informative :-)

RWBuffer vs RWStructuredBuffer or RWByteAddressBuffer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

RWBuffer vs RWStructuredBuffer or RWByteAddressBuffer

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines