• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
B_old

HLSL pack two values into one component of a 4x16_UNORM target

12 posts in this topic

Hi, I have a DXGI_FORMAT_R16G16B16A16_FLOAT rendertarget (could switch to float if it helps) and want to encode two different values in the last component. One value will be in the range of [0.1] in the pixelshader and the other one typically something like [10-128] or so. Is this possible? I tried this, with limited success:
float packSpecularCoefAndPow(float coef, float pow)
{
	return float((uint(coef * 255.0f)) << 16 | uint(pow));
}

float2 unpackSpecularCoefAndPow(float coefAndPow)
{
	uint tmp = uint(coefAndPow);
	return float2(float(tmp >> 16) / 255.0f, tmp & 0x0000ffff);
}

Is something like this possible?
0

Share this post


Link to post
Share on other sites
OK, so I came up with this solution.

float packSpecularCoefAndPow(float coef, float pow)
{
uint sc = uint(coef * 255.f);
uint sp = uint(pow);

return float(sc << 8 | sp) / 65536.f;
}

float2 unpackSpecularCoefAndPow(float coefAndPow)
{
uint tmp = uint(coefAndPow * 65536.f);

return float2(float(tmp >> 8) / 255.f, float(tmp & 0x000000ff));
}


The result are good. Does anybody see a faster way to do this?
0

Share this post


Link to post
Share on other sites
Bit shifts are usually executed on so-called "transcendental ALUs" which are more scarce than basic ALU units. If you can convert/refactor these to floating-point multiplies, the hardware can execute the logic in basic ALUs which will generally result in better overall arithmetic performance, as well as frees the transcendental units to tasks that actually require them, thus reducing potential bottlenecks that can cause latency.

Also, runtime conversions between ints and floats are relatively expensive so it might be worthwhile to try to eliminate them as much as you can.

Your current code has some room for optimization in these areas, but it will provide correct results as-is.
0

Share this post


Link to post
Share on other sites
Hi Nik02, thanks for the answer.
The best I can come up with is this:

float packSpecularCoefAndPow(float coef, float pow)
{
return float(uint(coef * 255.f) * 256 | uint(pow));
//return float(uint(coef * 65280.f)| uint(pow)); doesn't work so good...
}


float2 unpackSpecularCoefAndPow(float coefAndPow)
{
return float2(coefAndPow / 65280.f, float(uint(coefAndPow) & 0x000000ff));
}




It is slightly faster than my first method. Another thing I noticed is that pow should be somewhere between 32 and 196 or so for both versions, but that is OK as I don't need a broader range.
I still have one integer-div and some conversions though...

[Edited by - B_old on May 10, 2009 11:56:47 AM]
0

Share this post


Link to post
Share on other sites
I don't see an integer division in that code. It is a transcendental operation so it is best to not use them if you don't need them.

As it happens, float division is also transcendental but in your case the division will be converted to multiply (which is cheaper) by the compiler if optimizations are on, because the divisor is a constant value.

With regard to conversions, it is best to keep the data in a same datatype across the entire pipeline, if at all possible. If not, the conversion ops are always available but will reduce performance on both CPU and GPU.

If you want more background info on these optimizations, I recommend reading Radeon programming guide (AMD/ATI) as well as NVidia GPU programming guide, both available for free at their respective developer sites. While the ALU architecture is slightly different between these platforms, general concepts are mostly the same because most of their internal algorithms are same.
0

Share this post


Link to post
Share on other sites
You are right, I meant an integer mul...

I now have this versions that relies on only floating point arithmetic. I hope.

float packSpecularCoefAndPow(float coef, float pow)
{
return (pow + clamp(coef, 0.f, 0.999f)) * 10.f; // * 10.f, because I would loose the coef sometimes. pow = 128 and coef = 0.1 were bad for example
}

float2 unpackSpecularCoefAndPow(float coefAndPow)
{
coefAndPow *= 0.1f;

float coef = frac(coefAndPow);
float pow = coefAndPow - coef;

return float2(coef, pow);
}



I can't really say that it is faster though.
At least the code is more readable and I should be able to use lower values for pow without a problem.
Any more comments on this one, Nik02? :) Thanks for the help!

[Edited by - B_old on May 10, 2009 2:09:50 PM]
0

Share this post


Link to post
Share on other sites
I don't remember, off the top of my hat, whether or not "frac" was one of the expensive instructions.

However:

Your current packing code has a signal leak between pow and coef. That said, if it works with your input values, go ahead and use it. I would use integer maths so as to precisely combine the bits of the values.

When I hinted about avoiding conversions, I had in mind an implementation that would take in integers during pack, and float2 during unpack or vice versa, depending on your needs.

It is worth considering that sometimes float-int conversions really are the best tool for a problem. Also, ints are easier (and more precise) to handle when you want to cram some raw bits together. BUT, if you can use float arithmetic to arrive at the same conclusion, it will usually suit the current GPUs better.

Performance-wise, I think you have relatively tight code now. To suggest more optimizations, I would need to do a more complete analysis of the actual usage of the functions.

Also, an important fact is that with real games, the CPU->GPU calls are often the bottleneck in modern machines. Thus, if your GPU is not the limiting factor to begin with, it is not worth it to spend your time to micro-optimize the shaders - it is enough to make them work fast enough so as to not limit the whole system performance. Usually that time is more wisely spent on efficient scene management code that minimizes device state changes and draw calls. This is the reason why you should do your profiling and performance-tuning in as close to real-world conditions as possible.

[Edited by - Nik02 on May 10, 2009 3:37:07 PM]
0

Share this post


Link to post
Share on other sites
Don't forget that with D3D10 and up, it is possible to interpret data in groovy ways. The shader intrinsics starting with "as" can be used to convert the representation of data from one data type to other. This means that you could carry integer data in your buffer's (texture's) channels that are typed as float. If you can wrap your head around this concept, I think it would be very appropriate for your scenario, and it would avoid unnecessary conversions effectively.
0

Share this post


Link to post
Share on other sites
I suppose the signal leak between those values is nothing I can do about if I stick to float arithmetic?

I took a look at the as*-functions and I don't immediately see an obvious benefit. It won't change my bit-pattern, so I can't say all my coef-parts are in the first x bits and all the other parts are in the last x bits, or something like that. Maybe I just haven't thought about this for long enough though.

While I have your attention I'd like to ask another question: Is there a best practice to distribute a 32bit float to two 16bit components? I am having the problem that 16bit depth in my deferred shading is not enough when I also use shadowmaps. I'm not really sure yet, whether I should bother, because in order to free up another 16bits I would have to start packing albedo-colors for example. Somehow it starts getting messy then. Also, I don't know when all the packing/unpacking will start to become as much of a performance penalty as the higher bandwidth of an extra rendertarget would be.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by B_old
I suppose the signal leak between those values is nothing I can do about if I stick to float arithmetic?


It is difficult to emulate bit-level operations on floats, so you'd need considerably more complex code. It is not impossible, but may be impractical.

Quote:

I can't say all my coef-parts are in the first x bits and all the other parts are in the last x bits


Oh but you can... ;)

Note that you can both output and input different representations of the values, regardless of where they come from or go to.

Quote:

While I have your attention I'd like to ask another question: Is there a best practice to distribute a 32bit float to two 16bit components? I am having the problem that 16bit depth in my deferred shading is not enough when I also use shadowmaps. I'm not really sure yet, whether I should bother, because in order to free up another 16bits I would have to start packing albedo-colors for example. Somehow it starts getting messy then. Also, I don't know when all the packing/unpacking will start to become as much of a performance penalty as the higher bandwidth of an extra rendertarget would be.


I would do all necessary packing in integer space if at all possible. I don't remember if there was a general best practice for this scenario.

While modern cards have a lot of processing power, memory bandwidth hasn't evolved at the same rate. Therefore, you can write quite complex shader logic before becoming ALU-bound.

The performance will ultimately depend on what else you're doing with the hardware, so it is impossible to say what the best approach is going to be in your particular application.

My ultimate recommendation is not to over-optimize first; just make it work and also write your other scene code so you actually draw all the stuff (or dummies of comparable complexity) that you would draw in the final version. Then, run the stuff on PIX and begin to see where the actual bottlenecks are. Now, you are in the position to make informed decisions as to where to actually optimize your code. Rinse and repeat :)
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Nik02
Quote:
Original post by B_old

I can't say all my coef-parts are in the first x bits and all the other parts are in the last x bits



Oh but you can... ;)

Note that you can both output and input different representations of the values, regardless of where they come from or go to.

The way I understood it, as*() won't change the way the bits are laid out. Isn't it problematic when the shader variable is 32bit but is gonna be output to 16bit?
I guess I have to see a practical use of as*() in order to get some inspiration first.
0

Share this post


Link to post
Share on other sites
The very point here is that the as* functions won't change how the bits are laid out. Remember, even though floats have more complex representation than ints, they are still constructed entirely from bits with well-defined algorithms.

Ask yourself, what stops you from constructing floats and halfs from bits yourself? And following that, what stops you from writing them out in any format, provided adequate space is available in the destination? When you can answer these, the solution may well become obvious [smile]
0

Share this post


Link to post
Share on other sites
I guess you are right. You gave me something to think about.

Regarding my original problem I changed tactics and rearranged my rendertargets, so right now there is no need to pack any values.
Thanks for the help!
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0