Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Structured buffer float compression


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
20 replies to this topic

#1 Hyunkel   Members   -  Reputation: 370

Like
0Likes
Like

Posted 05 June 2012 - 04:12 AM

I have a computer shader that generates procedural planetary terrain and stores vertices in a structured buffer which has the following layout.
struct PlanetVertex
{
  float3 Position;
  float3 Normal;
  float Temperature;
  float Humidity;
};

That's 10 floats with 4 bytes per float -> 40 bytes per vertex.
A terrain node or patch contains 33x33 vertices, which is 43560 bytes.
At the highest quality setting, the compute shader will output up to 5000 nodes,
so the buffer needs to be 5000 * 43560 bytes, which is
217800000 bytes or ~207mb.

Due to the way I handle load balancing between rendering and terrain generation, I need to have this buffer in memory twice, so I use
~415mb only for vertex data.
This is okay I guess, since a planet is sort of the primary object, but I want to reduce this buffer size if possible.

For example the normal vector: It doesn't need 32bit precision per channel, 16 would be more than enough.
As for temperature and humidity, they could even fit in an 8bit unorm, but I doubt that's available here.

I found that there are f32tof16 and f16tof32 functions in hlsl, which I assume are what I need, but I cannot quite figure out how they're supposed to work:
http://msdn.microsoft.com/en-us/library/windows/desktop/ff471399(v=vs.85).aspx

It says here that f32to16 returns a uint, but isn't that 32 bit as well?

Cheers,
Hyu

Sponsor:

#2 Nik02   Crossbones+   -  Reputation: 2883

Like
1Likes
Like

Posted 05 June 2012 - 04:40 AM

The f16 is stored in the 16 lowest bits of the uint.

Niko Suni


#3 Hyunkel   Members   -  Reputation: 370

Like
0Likes
Like

Posted 05 June 2012 - 04:45 AM

Oh, alright, now I get it.
So basically I have to run f32tof16 on two floats which I want to store in a single uint (with 16 bit precision) but I have to do packaging myself.
uint Float2ToF1616(in float2 f)
{
  uint packed;
  packed = asuint(f32tof16(f.x)) | (asuint(f32tof16(f.y)) << 16);
  return packed;
}

Thanks! :)

#4 Nik02   Crossbones+   -  Reputation: 2883

Like
0Likes
Like

Posted 05 June 2012 - 04:45 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny

Niko Suni


#5 Nik02   Crossbones+   -  Reputation: 2883

Like
0Likes
Like

Posted 05 June 2012 - 04:45 AM

Oh, alright, now I get it.
So basically I have to run f32tof16 on two floats which I want to store in a single uint (with 16 bit precision) but I have to do packaging myself.

uint Float2ToF1616(in float2 f)
{
  uint packed;
  packed = asuint(f32tof16(f.x)) | (asuint(f32tof16(f.y)) << 16);
  return packed;
}

Thanks! :)


Yes

Niko Suni


#6 Nik02   Crossbones+   -  Reputation: 2883

Like
0Likes
Like

Posted 05 June 2012 - 04:48 AM

The underlying reason for this is that modern hardware doesn't actually have 16-bit registers.

Niko Suni


#7 Hyunkel   Members   -  Reputation: 370

Like
0Likes
Like

Posted 05 June 2012 - 05:31 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny


The only packing algorithms I know of are these:
http://aras-p.info/texts/CompactNormalStorage.html

and they are for view space normal vectors mostly.
I don't really see a quick way to do 2 channel packing, though it would be quite useful if I could.

The underlying reason for this is that modern hardware doesn't actually have 16-bit registers.


Yeah, I'm aware, which is why I thought that f32tof16() would need to take 2 floats as input and got confused.

#8 Nik02   Crossbones+   -  Reputation: 2883

Like
1Likes
Like

Posted 05 June 2012 - 05:37 AM

The method #1 on that page is what I had in mind. You do need bias the x and y to 0...1 beforehand, which I forgot to mention.

Niko Suni


#9 Nik02   Crossbones+   -  Reputation: 2883

Like
1Likes
Like

Posted 05 June 2012 - 05:44 AM

...and since you bias the x and y to positive range, you can use either's sign to encode the z direction.

Niko Suni


#10 Madhed   Crossbones+   -  Reputation: 3082

Like
1Likes
Like

Posted 05 June 2012 - 05:49 AM

If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny


Did you mean nz = sqrt(1.0 - nx2 - ny2) ?

#11 Hyunkel   Members   -  Reputation: 370

Like
0Likes
Like

Posted 05 June 2012 - 05:50 AM

I always assumed that with this method it would be troublesome to reconstruct the correct sign for z.
But yes, I see how you can just store the sign in the x or y channel since both are biased to a positive range.

Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?

#12 Nik02   Crossbones+   -  Reputation: 2883

Like
0Likes
Like

Posted 05 June 2012 - 05:51 AM


If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

nz = 1.0 - nx - ny


Did you mean nz = sqrt(1.0 - nx2 - ny2) ?


Yea, sorry for the confusion.

Edited by Nik02, 05 June 2012 - 05:55 AM.

Niko Suni


#13 MJP   Moderators   -  Reputation: 11590

Like
0Likes
Like

Posted 05 June 2012 - 11:36 AM

Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?


Because other methods may be cheaper and/or give you better precision.

#14 Hyunkel   Members   -  Reputation: 370

Like
0Likes
Like

Posted 05 June 2012 - 12:22 PM

I doubt they'll be much cheaper considering the simplicity of the store sign and reconstruct z method.
However I can see how other methods could provide more precision. In fact I did notice precision issues using this very method.
Now that I think about it, the method also only works on signed types in order to store the sign of z.

#15 Nik02   Crossbones+   -  Reputation: 2883

Like
0Likes
Like

Posted 05 June 2012 - 11:19 PM

The optimal approach depends on the problem you're trying to solve. If lower precision is enough for your use case, using lower-precision calculation could be just fine. If best quality is needed, then you should use float3 to store the data at full fidelity and suffer the bandwidth/storage impact.

View-space normal storage is a different scenario than per-vertex normal storage. Vertex shader is typically invoked a lot more infrequently than a pixel shader, so more complex calculations can be done there without hitting a bottleneck.

Niko Suni


#16 pcmaster   Members   -  Reputation: 681

Like
1Likes
Like

Posted 06 June 2012 - 06:16 AM

Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.

Edited by pcmaster, 06 June 2012 - 06:26 AM.


#17 Ashaman73   Crossbones+   -  Reputation: 7864

Like
0Likes
Like

Posted 06 June 2012 - 06:59 AM

Have you considered alignment of data ?

struct PlanetVertex
{
  float3 Position;
  float3 Normal;
  float Temperature;
  float Humidity;
};
That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.

#18 Hyunkel   Members   -  Reputation: 370

Like
0Likes
Like

Posted 06 June 2012 - 09:21 AM

Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.


I think you misunderstand what I'm doing, though it is perfectly possible that I'm misunderstanding what you're suggesting.
The generated data comes from a compute shader, which is not part of the normal rendering pipeline.
It is stored in a structured buffer, which can only contain hlsl data types (afaik), and as you mentioned, half is not available.

If I was able to use formats such as R11G11B10_FLOAT there would be no issues.
For the view space normals in my geometry buffer I do in fact use an R16G16 format with packing method #4 and it works brilliantly.


Have you considered alignment of data ?

struct PlanetVertex
{
  float3 Position;
  float3 Normal;
  float Temperature;
  float Humidity;
};
That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.


Well... no. I have not.
Do I have to consider this with structured buffers?
Right now I have this:

struct PlanetVertex
{
  float3 Position;
  uint2 PackedNormalTempHumidity;
};

With PackedNormalTempHumidity being packed as
X = float2(Normal.x, Normal.y)
Y = float2(Temperature, Humidity)

Though there are some minor precision issues with the 2 channel normal vector when z is close to 0
I might go for something like this instead:

struct PlanetVertex
{
  float3 Position;
  uint2 PackedNormalTemp
  uint PackedHumiditySomethingelse;
};

With a 3 channel normal vector with 16bit/channel

Edited by Hyunkel, 06 June 2012 - 09:25 AM.


#19 MJP   Moderators   -  Reputation: 11590

Like
1Likes
Like

Posted 06 June 2012 - 10:54 PM

Well... no. I have not.
Do I have to consider this with structured buffers?


No, you don't. It's totally legal to access structures with a stride that's not a multiple of 16 bytes.

#20 pcmaster   Members   -  Reputation: 681

Like
1Likes
Like

Posted 07 June 2012 - 01:13 AM

Hyunkel, yes, I'm familiar with DX11 compute shaders. You don't necessarily need to use StructuredBuffer UAV. You can use several UAVs as outputs of your compute shaders. So instead of a stream (array) of packed interleaved struct data, you might have streams (arrays) of individual struct members. Instead of 1 RWStructuredBuffer, you'd have 4 RWBuffers as targets of your compute shader. The main disadvantage I see is that you use 4 target slots instead of 1 (there should always be at least 8 supported, if I recall well). I believe you can have texture/buffer UAVs as well in cs_5_0 (unlike cs_4_1) but I've actually used RWStructuredBuffer just like you.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS