• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Hyunkel

Structured buffer float compression

20 posts in this topic

I have a computer shader that generates procedural planetary terrain and stores vertices in a structured buffer which has the following layout.
[CODE]
struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};
[/CODE]

That's 10 floats with 4 bytes per float -> 40 bytes per vertex.
A terrain node or patch contains 33x33 vertices, which is 43560 bytes.
At the highest quality setting, the compute shader will output up to 5000 nodes,
so the buffer needs to be 5000 * 43560 bytes, which is
217800000 bytes or ~207mb.

Due to the way I handle load balancing between rendering and terrain generation, I need to have this buffer in memory twice, so I use
~415mb only for vertex data.
This is okay I guess, since a planet is sort of the primary object, but I want to reduce this buffer size if possible.

For example the normal vector: It doesn't need 32bit precision per channel, 16 would be more than enough.
As for temperature and humidity, they could even fit in an 8bit unorm, but I doubt that's available here.

I found that there are f32tof16 and f16tof32 functions in hlsl, which I assume are what I need, but I cannot quite figure out how they're supposed to work:
[url="http://msdn.microsoft.com/en-us/library/windows/desktop/ff471399(v=vs.85).aspx"]http://msdn.microsoft.com/en-us/library/windows/desktop/ff471399(v=vs.85).aspx[/url]

It says here that f32to16 returns a uint, but isn't that 32 bit as well?

Cheers,
Hyu
0

Share this post


Link to post
Share on other sites
Oh, alright, now I get it.
So basically I have to run f32tof16 on two floats which I want to store in a single uint (with 16 bit precision) but I have to do packaging myself.
[CODE]
uint Float2ToF1616(in float2 f)
{
uint packed;
packed = asuint(f32tof16(f.x)) | (asuint(f32tof16(f.y)) << 16);
return packed;
}
[/CODE]

Thanks! :)
0

Share this post


Link to post
Share on other sites
If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

n[sub]z[/sub] = 1.0 - n[sub]x[/sub] - n[sub]y[/sub]
0

Share this post


Link to post
Share on other sites
[quote name='Hyunkel' timestamp='1338893120' post='4946379']
Oh, alright, now I get it.
So basically I have to run f32tof16 on two floats which I want to store in a single uint (with 16 bit precision) but I have to do packaging myself.
[CODE]
uint Float2ToF1616(in float2 f)
{
uint packed;
packed = asuint(f32tof16(f.x)) | (asuint(f32tof16(f.y)) << 16);
return packed;
}
[/CODE]

Thanks! :)
[/quote]

Yes
0

Share this post


Link to post
Share on other sites
[quote name='Nik02' timestamp='1338893126' post='4946380']
If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

n[sub]z[/sub] = 1.0 - n[sub]x[/sub] - n[sub]y[/sub]
[/quote]

The only packing algorithms I know of are these:
[url="http://aras-p.info/texts/CompactNormalStorage.html"]http://aras-p.info/texts/CompactNormalStorage.html[/url]

and they are for view space normal vectors mostly.
I don't really see a quick way to do 2 channel packing, though it would be quite useful if I could.

[quote name='Nik02' timestamp='1338893291' post='4946384']
The underlying reason for this is that modern hardware doesn't actually have 16-bit registers.
[/quote]

Yeah, I'm aware, which is why I thought that f32tof16() would need to take 2 floats as input and got confused.
0

Share this post


Link to post
Share on other sites
The method #1 on that page is what I had in mind. You do need bias the x and y to 0...1 beforehand, which I forgot to mention.
1

Share this post


Link to post
Share on other sites
[quote name='Nik02' timestamp='1338893126' post='4946380']
If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

n[sub]z[/sub] = 1.0 - n[sub]x[/sub] - n[sub]y[/sub]
[/quote]

Did you mean n[sub]z[/sub] = sqrt(1.0 - n[sub]x[/sub][sup]2[/sup] - n[sub]y[/sub][sup]2[/sup]) ?
1

Share this post


Link to post
Share on other sites
I always assumed that with this method it would be troublesome to reconstruct the correct sign for z.
But yes, I see how you can just store the sign in the x or y channel since both are biased to a positive range.

Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?
0

Share this post


Link to post
Share on other sites
[quote name='Madhed' timestamp='1338896962' post='4946400']
[quote name='Nik02' timestamp='1338893126' post='4946380']
If you assume that the normal is a normalized vector, you can pack it to a 2-element vector:

n[sub]z[/sub] = 1.0 - n[sub]x[/sub] - n[sub]y[/sub]
[/quote]

Did you mean n[sub]z[/sub] = sqrt(1.0 - n[sub]x[/sub][sup]2[/sup] - n[sub]y[/sub][sup]2[/sup]) ?
[/quote]

Yea, sorry for the confusion. Edited by Nik02
0

Share this post


Link to post
Share on other sites
[quote name='Hyunkel' timestamp='1338897001' post='4946401']
Now I'm confused though. This seems excessively easy. Why the need for all the much more complicated methods then?
[/quote]

Because other methods may be cheaper and/or give you better precision.
0

Share this post


Link to post
Share on other sites
I doubt they'll be much cheaper considering the simplicity of the store sign and reconstruct z method.
However I can see how other methods could provide more precision. In fact I did notice precision issues using this very method.
Now that I think about it, the method also only works on signed types in order to store the sign of z.
0

Share this post


Link to post
Share on other sites
The optimal approach depends on the problem you're trying to solve. If lower precision is enough for your use case, using lower-precision calculation could be just fine. If best quality is needed, then you should use float3 to store the data at full fidelity and suffer the bandwidth/storage impact.

View-space normal storage is a different scenario than per-vertex normal storage. Vertex shader is typically invoked a lot more infrequently than a pixel shader, so more complex calculations can be done there without hitting a bottleneck.
0

Share this post


Link to post
Share on other sites
Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example. Edited by pcmaster
1

Share this post


Link to post
Share on other sites
Have you considered alignment of data ?

[CODE]
struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};
[/CODE]
That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.
0

Share this post


Link to post
Share on other sites
[quote name='pcmaster' timestamp='1338984972' post='4946733']
Why not use DXGI_FORMAT_R11G11B10_FLOAT? There's plenty of nice formats. Or a single R32_UINT and pack your normal manually, no big deal. No need to ever use R32G32B32_FLOAT format for normals transfer! If you can have normals in screen-space, definitely go pack them according to one of the methods linked. I use #4 to my greatest pleasure in screen space (1 float3 to float2 (stored as R16G16_FLOAT)).

Regarding HLSL, you just write your float, float2, float3, whatever happily and depending on the bound target view (RTV, DSV, UAV?) format, the conversion happens automatically. There is no HLSL construct for "half", there is no need.

There is no sense of packing data in between the shader stages (such as from vertex shader to hull shader or such). You can happily send i.e. R8_UNORM buffers to input assembler (or bind them as SRV) and your shaders see whatever type (such as float) automatically. The same at output.

I'd split your struct into separate streams of positions, normals, temperatures etc, with formats R32G32B32_FLOAT, R11G11B10_FLOAT, R8_UNORM, etc., for example.
[/quote]

I think you misunderstand what I'm doing, though it is perfectly possible that I'm misunderstanding what you're suggesting.
The generated data comes from a compute shader, which is not part of the normal rendering pipeline.
It is stored in a structured buffer, which can only contain hlsl data types (afaik), and as you mentioned, half is not available.

If I was able to use formats such as R11G11B10_FLOAT there would be no issues.
For the view space normals in my geometry buffer I do in fact use an R16G16 format with packing method #4 and it works brilliantly.


[quote name='Ashaman73' timestamp='1338987545' post='4946736']
Have you considered alignment of data ?

[CODE]
struct PlanetVertex
{
float3 Position;
float3 Normal;
float Temperature;
float Humidity;
};
[/CODE]
That's 2x 4-component vectors, considering that you want the data aligned, using 16 instead of 32 bit could be your best bet.
[/quote]

Well... no. I have not.
Do I have to consider this with structured buffers?
Right now I have this:

[CODE]
struct PlanetVertex
{
float3 Position;
uint2 PackedNormalTempHumidity;
};
[/CODE]

With PackedNormalTempHumidity being packed as
X = float2(Normal.x, Normal.y)
Y = float2(Temperature, Humidity)

Though there are some minor precision issues with the 2 channel normal vector when z is close to 0
I might go for something like this instead:

[CODE]
struct PlanetVertex
{
float3 Position;
uint2 PackedNormalTemp
uint PackedHumiditySomethingelse;
};
[/CODE]

With a 3 channel normal vector with 16bit/channel Edited by Hyunkel
0

Share this post


Link to post
Share on other sites
[quote name='Hyunkel' timestamp='1338996087' post='4946776']
Well... no. I have not.
Do I have to consider this with structured buffers?
[/quote]

No, you don't. It's totally legal to access structures with a stride that's not a multiple of 16 bytes.
1

Share this post


Link to post
Share on other sites
Hyunkel, yes, I'm familiar with DX11 compute shaders. You don't necessarily need to use StructuredBuffer UAV. You can use several UAVs as outputs of your compute shaders. So instead of a stream (array) of packed interleaved struct data, you might have streams (arrays) of individual struct members. Instead of 1 RWStructuredBuffer, you'd have 4 RWBuffers as targets of your compute shader. The main disadvantage I see is that you use 4 target slots instead of 1 (there should always be at least 8 supported, if I recall well). I believe you can have texture/buffer UAVs as well in cs_5_0 (unlike cs_4_1) but I've actually used RWStructuredBuffer just like you.
1

Share this post


Link to post
Share on other sites
[quote name='MJP' timestamp='1339044850' post='4946949']
[quote name='Hyunkel' timestamp='1338996087' post='4946776']
Well... no. I have not.
Do I have to consider this with structured buffers?
[/quote]

No, you don't. It's totally legal to access structures with a stride that's not a multiple of 16 bytes.
[/quote]
Thanks.

[quote name='pcmaster' timestamp='1339053221' post='4946972']
Hyunkel, yes, I'm familiar with DX11 compute shaders. You don't necessarily need to use StructuredBuffer UAV. You can use several UAVs as outputs of your compute shaders. So instead of a stream (array) of packed interleaved struct data, you might have streams (arrays) of individual struct members. Instead of 1 RWStructuredBuffer, you'd have 4 RWBuffers as targets of your compute shader. The main disadvantage I see is that you use 4 target slots instead of 1 (there should always be at least 8 supported, if I recall well). I believe you can have texture/buffer UAVs as well in cs_5_0 (unlike cs_4_1) but I've actually used RWStructuredBuffer just like you.
[/quote]
Oh, you can use the standard storage formats with RWBuffers?
For some reason I never thought of that.
That does make things a lot easier indeed.
I'll have to see if that performs just as well, but I see no reason why it wouldn't, and the memory footprint should obviously be quite a bit lower.

Thanks for pointing this out! :)
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0