# two 16 bits into one 32 bit

This topic is 4064 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

im trying to combine two 16 bit floats into one 32 bit float... but its not goin to well... ive tried to search the forum for a solution.. but i seem to use the wrong keywords or something... any help?

##### Share on other sites
Assuming C++ where sizeof(int)==4 and sizeof(short)==2:

float f1, f2; // Your two 16 bit floatsfloat f3; // Your 32-bit floatunsigned int s1 = *(unsigned short*)&f1;unsigned int s2 = *(unsigned short*)&f2;*(unsigned int*)&f3 = (s1<<16) | s2;

If you want to get them back out, or doe any sort of arithmetic, or you're not using C++, you'll need to give more information.

##### Share on other sites
yea im sry about that... im doin it in HLSL...

EDIT:: would be great if u could show me the HLSL code.. if not then the math behind it will be helpful also

##### Share on other sites
Well you can't just shift floats like that. The resulting 32 bit number will have no real meaning, other than being some random float.

IEEE 32 bit floating point format:

s | eeeeeeee | mmmmmmmmmmmmmmmmmmmmmmm

1 sign bit, 8 bit exponent in bias -127 notation, and a 23 bit mantissa (significand) with an implied 1 as the first digit, so really 24 bits.

16 bit floating point format:

s | eeeee | mmmmmmmmmmm

1 sign bit, 5 bit exponent in bias -127 notation, and a 10 bit mantissa (significand) with an implied 1 as the first digit, so really 11 bits.

so basically, you can't just put two 16 bit floats next to each other and expect it to have any meaning.

why do you need to do this anyway, what is the motivation for this need?

##### Share on other sites
Quote:
 Original post by Dragon_Strikeim trying to combine two 16 bit floats into one 32 bit float...

What do you mean by "combine"?

##### Share on other sites
ok... what i want to do is to pack two 16 bit values (0.0-1.0) into a 32 bit (0.0-1.0)and then unpack them at a later stage... something like

value1*(2^16-1)+value2 = value12 -> value1 = floor(value12/(2^16-1)) & value1 = fract(value12/(2^16-1))*(2^16-1)

which doesnt work.. but u get the main idea...

what i want to do is to pack a 16 bit heightmap and the morphvalues of each point into one texture

##### Share on other sites
Convert to and from fixed point:
unsigned int i32 = ((unsigned int)(v1 * 65535) << 16) | (unsigned int)(v2 * 65535);float o1 = (i32 >> 16) / 65535.0,      o2 = (i32 & 0xFFFF) / 65535.0;

I don't know specifics of HLSL, but something similar to the above code should do the trick.

##### Share on other sites
thx that helped me solve it

##### Share on other sites
Since you've stated the values are between 0 and 1 and the result is being stored in a 32-bit float. For your formula I think you want something more like this (written as you had earlier):

value1*(2^12)+value2 = value12 -> value1 = floor(value12)/(2^12) & value1 = fract(value12)

The reason for using 2^12 instead of 2^16 is that 32-bit floats only have 23 significand bits, so if you used 2^16 you would only have 7 significant bits left for value2. Since 16-bit floats have 10 significand bits, using a multiplication of 2^12 will make sure you don't lose any accuracy.

You would first have to be fairly sure of two things though:
1. The memory usage is a bottleneck, and
2. Reducing the memory usage by a factor of two is going to make a significant enough difference.

##### Share on other sites
What language are you using that has a 16 bit float? I thought they were usually 32 or 64 bits.

1. 1
2. 2
3. 3
Rutin
15
4. 4
5. 5
khawk
11

• 10
• 9
• 9
• 11
• 11
• ### Forum Statistics

• Total Topics
633679
• Total Posts
3013301
×