View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# Structured buffer float compression

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

20 replies to this topic

### #1Hyunkel  Members

Posted 05 June 2012 - 04:12 AM

I have a computer shader that generates procedural planetary terrain and stores vertices in a structured buffer which has the following layout.
struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};


That's 10 floats with 4 bytes per float -> 40 bytes per vertex.
A terrain node or patch contains 33x33 vertices, which is 43560 bytes.
At the highest quality setting, the compute shader will output up to 5000 nodes,
so the buffer needs to be 5000 * 43560 bytes, which is
217800000 bytes or ~207mb.

Due to the way I handle load balancing between rendering and terrain generation, I need to have this buffer in memory twice, so I use
~415mb only for vertex data.
This is okay I guess, since a planet is sort of the primary object, but I want to reduce this buffer size if possible.

For example the normal vector: It doesn't need 32bit precision per channel, 16 would be more than enough.
As for temperature and humidity, they could even fit in an 8bit unorm, but I doubt that's available here.

I found that there are f32tof16 and f16tof32 functions in hlsl, which I assume are what I need, but I cannot quite figure out how they're supposed to work:
http://msdn.microsoft.com/en-us/library/windows/desktop/ff471399(v=vs.85).aspx

It says here that f32to16 returns a uint, but isn't that 32 bit as well?

Cheers,
Hyu

### #2Nik02  Members

Posted 05 June 2012 - 04:40 AM

The f16 is stored in the 16 lowest bits of the uint.

Niko Suni

### #3Hyunkel  Members

Posted 05 June 2012 - 04:45 AM

Oh, alright, now I get it.
So basically I have to run f32tof16 on two floats which I want to store in a single uint (with 16 bit precision) but I have to do packaging myself.
uint Float2ToF1616(in float2 f)
{
uint packed;
packed = asuint(f32tof16(f.x)) | (asuint(f32tof16(f.y)) << 16);
return packed;
}


Thanks!

### #4Nik02  Members

Posted 05 June 2012 - 04:45 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny

Niko Suni

### #5Nik02  Members

Posted 05 June 2012 - 04:45 AM

Oh, alright, now I get it.
So basically I have to run f32tof16 on two floats which I want to store in a single uint (with 16 bit precision) but I have to do packaging myself.

uint Float2ToF1616(in float2 f)
{
uint packed;
packed = asuint(f32tof16(f.x)) | (asuint(f32tof16(f.y)) << 16);
return packed;
}


Thanks!

Yes

Niko Suni

### #6Nik02  Members

Posted 05 June 2012 - 04:48 AM

The underlying reason for this is that modern hardware doesn't actually have 16-bit registers.

Niko Suni

### #7Hyunkel  Members

Posted 05 June 2012 - 05:31 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny

The only packing algorithms I know of are these:
http://aras-p.info/texts/CompactNormalStorage.html

and they are for view space normal vectors mostly.
I don't really see a quick way to do 2 channel packing, though it would be quite useful if I could.

The underlying reason for this is that modern hardware doesn't actually have 16-bit registers.

Yeah, I'm aware, which is why I thought that f32tof16() would need to take 2 floats as input and got confused.

### #8Nik02  Members

Posted 05 June 2012 - 05:37 AM

The method #1 on that page is what I had in mind. You do need bias the x and y to 0...1 beforehand, which I forgot to mention.

Niko Suni

### #9Nik02  Members

Posted 05 June 2012 - 05:44 AM

...and since you bias the x and y to positive range, you can use either's sign to encode the z direction.

Niko Suni

Posted 05 June 2012 - 05:49 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny

Did you mean nz = sqrt(1.0 - nx2 - ny2) ?

### #11Hyunkel  Members

Posted 05 June 2012 - 05:50 AM

I always assumed that with this method it would be troublesome to reconstruct the correct sign for z.
But yes, I see how you can just store the sign in the x or y channel since both are biased to a positive range.

Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?

### #12Nik02  Members

Posted 05 June 2012 - 05:51 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny

Did you mean nz = sqrt(1.0 - nx2 - ny2) ?

Yea, sorry for the confusion.

Edited by Nik02, 05 June 2012 - 05:55 AM.

Niko Suni

### #13MJP  Moderators

Posted 05 June 2012 - 11:36 AM

Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?

Because other methods may be cheaper and/or give you better precision.

### #14Hyunkel  Members

Posted 05 June 2012 - 12:22 PM

I doubt they'll be much cheaper considering the simplicity of the store sign and reconstruct z method.
However I can see how other methods could provide more precision. In fact I did notice precision issues using this very method.
Now that I think about it, the method also only works on signed types in order to store the sign of z.

### #15Nik02  Members

Posted 05 June 2012 - 11:19 PM

The optimal approach depends on the problem you're trying to solve. If lower precision is enough for your use case, using lower-precision calculation could be just fine. If best quality is needed, then you should use float3 to store the data at full fidelity and suffer the bandwidth/storage impact.

View-space normal storage is a different scenario than per-vertex normal storage. Vertex shader is typically invoked a lot more infrequently than a pixel shader, so more complex calculations can be done there without hitting a bottleneck.

Niko Suni

### #16pcmaster  Members

Posted 06 June 2012 - 06:16 AM

Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.

Edited by pcmaster, 06 June 2012 - 06:26 AM.

### #17Ashaman73  Members

Posted 06 June 2012 - 06:59 AM

Have you considered alignment of data ?

struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};

That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.

Ashaman

### #18Hyunkel  Members

Posted 06 June 2012 - 09:21 AM

Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.

I think you misunderstand what I'm doing, though it is perfectly possible that I'm misunderstanding what you're suggesting.
The generated data comes from a compute shader, which is not part of the normal rendering pipeline.
It is stored in a structured buffer, which can only contain hlsl data types (afaik), and as you mentioned, half is not available.

If I was able to use formats such as R11G11B10_FLOAT there would be no issues.
For the view space normals in my geometry buffer I do in fact use an R16G16 format with packing method #4 and it works brilliantly.

Have you considered alignment of data ?

struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};

That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.

Well... no. I have not.
Do I have to consider this with structured buffers?
Right now I have this:

struct PlanetVertex
{
float3 Position;
uint2 PackedNormalTempHumidity;
};


With PackedNormalTempHumidity being packed as
X = float2(Normal.x, Normal.y)
Y = float2(Temperature, Humidity)

Though there are some minor precision issues with the 2 channel normal vector when z is close to 0
I might go for something like this instead:

struct PlanetVertex
{
float3 Position;
uint2 PackedNormalTemp
uint PackedHumiditySomethingelse;
};


With a 3 channel normal vector with 16bit/channel

Edited by Hyunkel, 06 June 2012 - 09:25 AM.

### #19MJP  Moderators

Posted 06 June 2012 - 10:54 PM

Well... no. I have not.
Do I have to consider this with structured buffers?

No, you don't. It's totally legal to access structures with a stride that's not a multiple of 16 bytes.

### #20pcmaster  Members

Posted 07 June 2012 - 01:13 AM

Hyunkel, yes, I'm familiar with DX11 compute shaders. You don't necessarily need to use StructuredBuffer UAV. You can use several UAVs as outputs of your compute shaders. So instead of a stream (array) of packed interleaved struct data, you might have streams (arrays) of individual struct members. Instead of 1 RWStructuredBuffer, you'd have 4 RWBuffers as targets of your compute shader. The main disadvantage I see is that you use 4 target slots instead of 1 (there should always be at least 8 supported, if I recall well). I believe you can have texture/buffer UAVs as well in cs_5_0 (unlike cs_4_1) but I've actually used RWStructuredBuffer just like you.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.