# Encoding 16 and 32 bit floating point value into RGBA byte texture

## Recommended Posts

nullsquared    126
So, basically, I calculate a single value in one of my shaders that I want to store in a usual RGBA texture, each channel being 8 bits. Now, I have two options - store a 16 bit value in two channels of the texture, or a 32-bit value using all of the texture's channels. My question, however, is, how do I "encode" this 16 bit value into two 8-bit channels? How would I "decode" it for use later on (from the same texture)? How would I do the same 2 things, but with the 32 bit value? Is it even possible to "distribute" like this? Many thanks in advance! By the way, yes, I know about floating point textures. But I'd rather not make use of them (even though they work, my previous implementation used them), there are some things I want to try with the byte texture.

##### Share on other sites
In a 16 Bit Bitmap the bits are stored in a 565 format. That is 5 bits for Red, 6 bits for Green and again 5 Bits for Blue. You'll have to align these bits using the '<<' and '>>' operators and a macro so that it can be stored in an array as the pixel format you want to write. There is another very recent post by me on this forum going by the name of 'Accessing the pixel data in a BMP'. In that there is a reply by Big Sassy. Check out the link he has posted for further details.

About the 32 bit Bitmap, I too am not sure about it. I also have a doubt in it.

##### Share on other sites
corysama    342
In shaderland, each channel of an 8888 texture represents a number between 0.0 and 1.0 inclusive using an 8 bit integer. The conversion from float to int that happen when a texture is written to is basically int(floatValue*255.0). The conversion from int to float that happens when a texture is read is float(intValue)/255.0

To store a value between 0.0 and 1.0 using 4 8-bit channels to achieve 32-bit precision you need to simulate fixed point operations using floating point math. Use multiplies as shifts, frac() to mask off the >1.0 bits and the assignment to an 8-bit output to mask off the <1.0/255 bits.

The obvious way to do this is shown here.
WARNING: This doesn't work (explained later).

FloatToInt()
out.r = frac(floatValue*1);
out.g = frac(floatValue*255);
out.b = frac(floatValue*255*255);
out.a = frac(floatValue*255*255*255);

IntToFloat()
in = intValue.r/(1)
+intValue.g/(255)
+intValue.b/(255*255)
+intValue.a/(255*255*255);

Obviously, FloatToInt() can be optimized to frac(floatValue*vectorConstant) and IntToFloat() can be optimized to dot(intValue, vectorContant2).

Unfortunately, we don't want to store 0.0 to 1.0 inclusive in each of the channels. If the channels include both extremes then the extreme values of each channel would overlap because they are equivalent. That means the above math would record 1.0/255 as (1, 255, 0, 0) which is double the correct value.

Instead we want to store 0.0 to 255.0/256 in each channel. That is the range of values represented by an 8-bit fixed point value. To convert a floating point [0.0,1.0] to a fixed point-ish [0.0,255.0/256] we multiply by 255.0/256.

FloatToInt()
const float toFixed = 255.0/256;
out.r = frac(floatValue*toFixed*1);
out.g = frac(floatValue*toFixed*255);
out.b = frac(floatValue*toFixed*255*255);
out.a = frac(floatValue*toFixed*255*255*255);

IntToFloat()
const float fromFixed = 256.0/255;
in = intValue.r*fromFixed/(1)
+intValue.g*fromFixed/(255)
+intValue.b*fromFixed/(255*255)
+intValue.a*fromFixed/(255*255*255);

Here's the bit of python I wrote to make sure I'm not full of shit.

def load(v):    r,g,b,a = v    return r/255.0, g/255.0, b/255.0, a/255.0def store(v):    r,g,b,a = v    return int(r*255), int(g*255), int(b*255), int(a*255)def frac(f):    return f - int(f)def floatToFixed(f):    toFixed = 255.0/256    return frac(f*toFixed*1), frac(f*toFixed*255), frac(f*toFixed*255*255), frac(f*toFixed*255*255*255)def fixedToFloat(v):    r,g,b,a = v    fromFixed = 256.0/255    return r*fromFixed/1 + g*fromFixed/(255) + b*fromFixed/(255*255) + a*fromFixed/(255*255*255)print fixedToFloat(load(store(floatToFixed(1.0))))print fixedToFloat(load(store(floatToFixed(0.0))))print fixedToFloat(load(store(floatToFixed(0.5))))print fixedToFloat(load(store(floatToFixed(1.0/3))))

result:
0.999999999763
0.0
0.499999999882
0.333333333254

##### Share on other sites
nullsquared    126
Wow, thank you very much! Also, thanks for explaining it shader-wise, because I am indeed doing this in a shader [smile].

Some questions:
- for example, what if the value was over 1? Would this still work, or no?
- how would I store the same value, but in 16 bits (two channels)?

##### Share on other sites
corysama    342
If you want to store a value over 1, you need to know what your range is. Before converting floatToFixed,you will need to divide by the range to map [0,range] down to [0,1]. Then when converting back with fixedToFloat you multiply the [0,1] value by the maximum range.

If you only want 16 bits of precision then just do the r and g portions of the code. The algorithm is incremental over any number of channels.

##### Share on other sites
nullsquared    126
Quote:
 Original post by corysamaIf you want to store a value over 1, you need to know what your range is. Before converting floatToFixed,you will need to divide by the range to map [0,range] down to [0,1]. Then when converting back with fixedToFloat you multiply the [0,1] value by the maximum range.

Aha, I see. I use a normalized value, but was just wondering what would happen [grin]. I tested it with a small C++ demo, too [lol].
Quote:
 If you only want 16 bits of precision then just do the r and g portions of the code. The algorithm is incremental over any number of channels.

Wow, that easy? This is really neat!

Again, thanks a bunch, I really appreciate it [smile].