First, the code. How it works is explained in the comments:

[source lang="plain"]// we want to store a float in two half floats using some bit hackery// first we interpret as integers so we can work with the bitsuint bits = floatBitsToUint( normalZDepth );uvec2 parts = uvec2( bits >> 16, // the higher 16 bits bits & 0x0000ffff ); // the lower 16 bits// each component's lower 16 bits now contain the lower and higher bits from the original value// we want these bits to remain the same when put in 16-bit floats.// we do this by putting these into normal 32-bit floats such that when these// 32-bit floats are converted into 16-bit floats, the important bits will be all that remain// 32-bit float: [ 1 (sign) | 8 (exponent) | 23 (mantissa) ]// 16-bit float: [ 1 (sign) | 5 (exponent) | 10 (mantissa) ]// when converting float to half:// bit 31 (sign) moves to bit 15// bits 23-30 (exponent) will be truncated such that bits 23-27 move to bits 10-14// bits 0-22 (mantissa) will be truncated such that bits 13-22 move to bits 0-9// therefore we construct the following integer to be cast back to float:// position: [31] [30-28] [27-23] [22-13] [12-0 ]// bits: [15] ...0... [14-10] [ 9-0 ] ...0...// combining the contiguous portion of the exponent and mantissa we get:// position: [31] [30-28] [27-13] [12-0 ]// bits: [15] ...0... [14-0 ] ...0...// so the final result is that we shift bit 15 by 16 over to bit 31, and bits 0-12 by 13 over to 13-27uvec2 floatBits = ((parts & 0x8000) << 16) | ((parts & 0x7FFF) << 13);// now just interpret as float - ready to be stored as half floatsvec2 halfBits = uintBitsToFloat( floatBits );[/source]

1) I'm only guessing this is how float to half conversion works - by truncating the upper bits of the exponent and the lower bits of the mantissa.

**Though actually now that I think about it, since the exponent is biased, this isn't quite right (but that should be easy to fix).**Could someone explain how this conversion actually occurs? Are there rounding issues that could occur? (The rest of the bits in the mantissa are 0.) Is there an IEEE754 standard for this conversion, and if so, is GLSL 3.30 required to follow it?

2) Suppose one of the integers ended up holding the value 0111110000000000. Then the resulting (32-bit) intermediate float would be:

0 00011111 00000000000000000000000

which is a "valid" float (not NaN, etc.). But when converting to a 16-bit float, it would become 0 11111 0000000000, which is NaN.

**(Again I realize this is slightly wrong since I forgot to take the exponent bias into account, but you could easily construct an equivalent example with bias.)**This is actually the behavior I desire, since it preserves the correct bits - and I would want other "harmless" bit patterns in the 32-bit float to convert to potentially "bad" values in the 16-bit float, such as underflow, overflow, infinity, denormalization, etc. However, I suspect this is not the case, since, for example, it would not make much sense for a "valid" 32-bit float to become an unrelated error just because the bits happened to line up in that way. So again, I guess this is the same as the first question - how exactly is the float to half conversion performed?

Alternatively, is there some way of telling GLSL to stick a particular set of bits into a half float output, rather than specifying a full float and hoping GLSL converts to half the way I want it to?

Or worded different... is there a way to guarantee a 1 to 1 mapping between half float bits and 16 bits of an integer?