Structured buffer float compression

Started by
19 comments, last by Hyunkel 11 years, 10 months ago
I always assumed that with this method it would be troublesome to reconstruct the correct sign for z.
But yes, I see how you can just store the sign in the x or y channel since both are biased to a positive range.

Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?
Advertisement

[quote name='Nik02' timestamp='1338893126' post='4946380']
If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

n[sub]z[/sub] = 1.0 - n[sub]x[/sub] - n[sub]y[/sub]


Did you mean n[sub]z[/sub] = sqrt(1.0 - n[sub]x[/sub][sup]2[/sup] - n[sub]y[/sub][sup]2[/sup]) ?
[/quote]

Yea, sorry for the confusion.

Niko Suni


Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?


Because other methods may be cheaper and/or give you better precision.
I doubt they'll be much cheaper considering the simplicity of the store sign and reconstruct z method.
However I can see how other methods could provide more precision. In fact I did notice precision issues using this very method.
Now that I think about it, the method also only works on signed types in order to store the sign of z.
The optimal approach depends on the problem you're trying to solve. If lower precision is enough for your use case, using lower-precision calculation could be just fine. If best quality is needed, then you should use float3 to store the data at full fidelity and suffer the bandwidth/storage impact.

View-space normal storage is a different scenario than per-vertex normal storage. Vertex shader is typically invoked a lot more infrequently than a pixel shader, so more complex calculations can be done there without hitting a bottleneck.

Niko Suni

Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.
Have you considered alignment of data ?


struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};

That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.

Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.


I think you misunderstand what I'm doing, though it is perfectly possible that I'm misunderstanding what you're suggesting.
The generated data comes from a compute shader, which is not part of the normal rendering pipeline.
It is stored in a structured buffer, which can only contain hlsl data types (afaik), and as you mentioned, half is not available.

If I was able to use formats such as R11G11B10_FLOAT there would be no issues.
For the view space normals in my geometry buffer I do in fact use an R16G16 format with packing method #4 and it works brilliantly.



Have you considered alignment of data ?


struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};

That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.


Well... no. I have not.
Do I have to consider this with structured buffers?
Right now I have this:


struct PlanetVertex
{
float3 Position;
uint2 PackedNormalTempHumidity;
};


With PackedNormalTempHumidity being packed as
X = float2(Normal.x, Normal.y)
Y = float2(Temperature, Humidity)

Though there are some minor precision issues with the 2 channel normal vector when z is close to 0
I might go for something like this instead:


struct PlanetVertex
{
float3 Position;
uint2 PackedNormalTemp
uint PackedHumiditySomethingelse;
};


With a 3 channel normal vector with 16bit/channel

Well... no. I have not.
Do I have to consider this with structured buffers?


No, you don't. It's totally legal to access structures with a stride that's not a multiple of 16 bytes.
Hyunkel, yes, I'm familiar with DX11 compute shaders. You don't necessarily need to use StructuredBuffer UAV. You can use several UAVs as outputs of your compute shaders. So instead of a stream (array) of packed interleaved struct data, you might have streams (arrays) of individual struct members. Instead of 1 RWStructuredBuffer, you'd have 4 RWBuffers as targets of your compute shader. The main disadvantage I see is that you use 4 target slots instead of 1 (there should always be at least 8 supported, if I recall well). I believe you can have texture/buffer UAVs as well in cs_5_0 (unlike cs_4_1) but I've actually used RWStructuredBuffer just like you.

This topic is closed to new replies.

Advertisement