two 16 bits into one 32 bit

Math and Physics Programming

Started by Dragon_Strike October 25, 2007 03:31 PM

10 comments, last by shotgunnutter 16 years, 5 months ago

264

Author

October 25, 2007 03:31 PM

im trying to combine two 16 bit floats into one 32 bit float... but its not goin to well... ive tried to search the forum for a solution.. but i seem to use the wrong keywords or something... any help?

Evil Steve

2,021

October 25, 2007 03:34 PM

Assuming C++ where sizeof(int)==4 and sizeof(short)==2:

float f1, f2; // Your two 16 bit floatsfloat f3; // Your 32-bit floatunsigned int s1 = *(unsigned short*)&f1;unsigned int s2 = *(unsigned short*)&f2;*(unsigned int*)&f3 = (s1<<16) | s2;

If you want to get them back out, or doe any sort of arithmetic, or you're not using C++, you'll need to give more information.

Dragon_Strike

264

Author

October 25, 2007 03:38 PM

yea im sry about that... im doin it in HLSL...

EDIT:: would be great if u could show me the HLSL code.. if not then the math behind it will be helpful also

Aressera

3,144

October 25, 2007 05:59 PM

Well you can't just shift floats like that. The resulting 32 bit number will have no real meaning, other than being some random float.

IEEE 32 bit floating point format:

s | eeeeeeee | mmmmmmmmmmmmmmmmmmmmmmm

1 sign bit, 8 bit exponent in bias -127 notation, and a 23 bit mantissa (significand) with an implied 1 as the first digit, so really 24 bits.

16 bit floating point format:

s | eeeee | mmmmmmmmmmm

1 sign bit, 5 bit exponent in bias -127 notation, and a 10 bit mantissa (significand) with an implied 1 as the first digit, so really 11 bits.

so basically, you can't just put two 16 bit floats next to each other and expect it to have any meaning.

why do you need to do this anyway, what is the motivation for this need?

Solias

564

October 25, 2007 06:52 PM

Quote:Original post by Dragon_Strike
im trying to combine two 16 bit floats into one 32 bit float...

What do you mean by "combine"?

Dragon_Strike

264

Author

October 26, 2007 12:43 AM

ok... what i want to do is to pack two 16 bit values (0.0-1.0) into a 32 bit (0.0-1.0)and then unpack them at a later stage... something like

value1*(2^16-1)+value2 = value12 -> value1 = floor(value12/(2^16-1)) & value1 = fract(value12/(2^16-1))*(2^16-1)

which doesnt work.. but u get the main idea...

what i want to do is to pack a 16 bit heightmap and the morphvalues of each point into one texture

coelurus

259

October 26, 2007 01:21 AM

Convert to and from fixed point:

unsigned int i32 = ((unsigned int)(v1 * 65535) << 16) | (unsigned int)(v2 * 65535);float o1 = (i32 >> 16) / 65535.0,      o2 = (i32 & 0xFFFF) / 65535.0;

I don't know specifics of HLSL, but something similar to the above code should do the trick.

Dragon_Strike

264

Author

October 26, 2007 02:05 AM

thx that helped me solve it

iMalc

2,466

October 26, 2007 02:06 AM

Since you've stated the values are between 0 and 1 and the result is being stored in a 32-bit float. For your formula I think you want something more like this (written as you had earlier):

value1*(2^12)+value2 = value12 -> value1 = floor(value12)/(2^12) & value1 = fract(value12)

The reason for using 2^12 instead of 2^16 is that 32-bit floats only have 23 significand bits, so if you used 2^16 you would only have 7 significant bits left for value2. Since 16-bit floats have 10 significand bits, using a multiplication of 2^12 will make sure you don't lose any accuracy.

You would first have to be fairly sure of two things though:
1. The memory usage is a bottleneck, and
2. Reducing the memory usage by a factor of two is going to make a significant enough difference.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

shotgunnutter

102

October 27, 2007 12:36 PM

What language are you using that has a 16 bit float? I thought they were usually 32 or 64 bits.

I just wanted to see if he would actually do it. Also, this test will rule out any problems with system services.

two 16 bits into one 32 bit

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

two 16 bits into one 32 bit

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines