two 16 bits into one 32 bit

Started by
10 comments, last by shotgunnutter 16 years, 5 months ago
im trying to combine two 16 bit floats into one 32 bit float... but its not goin to well... ive tried to search the forum for a solution.. but i seem to use the wrong keywords or something... any help?
Advertisement
Assuming C++ where sizeof(int)==4 and sizeof(short)==2:

float f1, f2; // Your two 16 bit floatsfloat f3; // Your 32-bit floatunsigned int s1 = *(unsigned short*)&f1;unsigned int s2 = *(unsigned short*)&f2;*(unsigned int*)&f3 = (s1<<16) | s2;


If you want to get them back out, or doe any sort of arithmetic, or you're not using C++, you'll need to give more information.
yea im sry about that... im doin it in HLSL...

EDIT:: would be great if u could show me the HLSL code.. if not then the math behind it will be helpful also
Well you can't just shift floats like that. The resulting 32 bit number will have no real meaning, other than being some random float.

IEEE 32 bit floating point format:

s | eeeeeeee | mmmmmmmmmmmmmmmmmmmmmmm

1 sign bit, 8 bit exponent in bias -127 notation, and a 23 bit mantissa (significand) with an implied 1 as the first digit, so really 24 bits.

16 bit floating point format:

s | eeeee | mmmmmmmmmmm

1 sign bit, 5 bit exponent in bias -127 notation, and a 10 bit mantissa (significand) with an implied 1 as the first digit, so really 11 bits.


so basically, you can't just put two 16 bit floats next to each other and expect it to have any meaning.

why do you need to do this anyway, what is the motivation for this need?
Quote:Original post by Dragon_Strike
im trying to combine two 16 bit floats into one 32 bit float...


What do you mean by "combine"?

ok... what i want to do is to pack two 16 bit values (0.0-1.0) into a 32 bit (0.0-1.0)and then unpack them at a later stage... something like

value1*(2^16-1)+value2 = value12 -> value1 = floor(value12/(2^16-1)) & value1 = fract(value12/(2^16-1))*(2^16-1)

which doesnt work.. but u get the main idea...

what i want to do is to pack a 16 bit heightmap and the morphvalues of each point into one texture
Convert to and from fixed point:
unsigned int i32 = ((unsigned int)(v1 * 65535) << 16) | (unsigned int)(v2 * 65535);float o1 = (i32 >> 16) / 65535.0,      o2 = (i32 & 0xFFFF) / 65535.0;

I don't know specifics of HLSL, but something similar to the above code should do the trick.
thx that helped me solve it
Since you've stated the values are between 0 and 1 and the result is being stored in a 32-bit float. For your formula I think you want something more like this (written as you had earlier):

value1*(2^12)+value2 = value12 -> value1 = floor(value12)/(2^12) & value1 = fract(value12)

The reason for using 2^12 instead of 2^16 is that 32-bit floats only have 23 significand bits, so if you used 2^16 you would only have 7 significant bits left for value2. Since 16-bit floats have 10 significand bits, using a multiplication of 2^12 will make sure you don't lose any accuracy.

You would first have to be fairly sure of two things though:
1. The memory usage is a bottleneck, and
2. Reducing the memory usage by a factor of two is going to make a significant enough difference.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
What language are you using that has a 16 bit float? I thought they were usually 32 or 64 bits.
I just wanted to see if he would actually do it. Also, this test will rule out any problems with system services.

This topic is closed to new replies.

Advertisement