#### Archived

This topic is now archived and is closed to further replies.

# small (short) float?

This topic is 5490 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm trying to minimize memory usage as much as possible in my program, and I realized alot of my object's 'float' members are only being used to cover a 0.0 - 1.0 range, rather than the insane whole number range floats can cover. So I was wondering if some people could help me with the code to create a "small float" datatype. Kinda like you can add "unsigned" to "int", it would be nice to designate a "small" keyword. Or maybe just utilize "short", so a "short float" datatype would be a "0 to 1" decimal value? I was hoping to do this as painlessly as possible, although I can assume it will be pretty advanced. Finally, I was wondering, before anyone were to help me with this, is it even worth it? Will the size still be the same in order to handle PRECISION within the 0.0 - 1.0 range? Perhaps it could also be hacked to a 4 or 5 decimal precision amount? Maybe something like this already exists? In all honesty, I don't care much about the logistics behind it; I'd be perfectly content with just copying and pasting some code in. Anyway, thanks for any help you can provide

##### Share on other sites
I don''t think there''s any smaller data type for floating point numbers than float for the x86 processor. From a speed issue I don''t even think it''s worth to manually make a 16 bit float, since all the manual bit shifiting you''d had to do would be 100 time slower than using the 32 bit version supported by the processor. Although it is a nice thought, I think you should abandon it (unless speed is not an isssue and size is everything).

##### Share on other sites
alot of graphics cards (like the GameCube's) support 16-bit packed number formats, where you specify how many bits represent the integer part, and how many bits represent the float part. You obviously lose alot of percision, and you do not want to try and do any math on these numbers, but for static data that is just sent to the graphics chip (like texture UVs, normals, and such), this can save alot of memory (and when you are limited to 24 MB, you need to save everywhere you can :-).

[edited by - chiuyan on July 5, 2003 7:13:01 PM]

##### Share on other sites
if you can live with the overhead of dividing each time you need to use the value, you can store them in a unsigned char, and divide by 255 when you use it.
class CTinyFloat  {  private:    unsigned char p_value;  public:    void SetValue(const float sValue)      {      p_value = unsigned char(sValue * 255.0f);      };    const float GetValue(void)      {      return (float)p_value/255.0f;      };  };

you''ll have to check for a valid range in SetValue, since a negative or > 1.0f value will not fit in the unsigned char...

##### Share on other sites
Use a short or char. When you want to convert from [0,1] do:

type quantizedfloat = (type)(x * (1 << (sizeof(type) << 3)));

To convert back:

float regularfloat = (float)quantizedfloat / (float)(1 << (sizeof(type) << 3));

To make it a little faster make sure you do the float->int, int->float conversion yourself to fit your needs. As mentioned, it's a little slower than just using floats and isn't really worth it unless you want to get a massive amount of floats down to a smaller size.

The most useful place for this is in sending vertices to the video card. You can compress them like this, send them to the card faster, then decompress with a vertex shader. There are a few articles in the reference section on vertex quantization.

Note that the above will lose some precision, with char losing more than short. Don't waste your time using this on data that the CPU manipulates, unless you're working on a platform where you need fixed point or something. Look into fixed point for a general solution to this problem, which works for values outside of [0,1].

------------
- outRider -

[edited by - outRider on July 5, 2003 7:05:52 PM]

##### Share on other sites
16 bit , fixed point, is my suggestion.

##### Share on other sites
Yeah, and since the number only need be 0 -> 1.. you only need to have 1.1.14 format. (1 bit for sign, 1 bit for integer part, and 14 bits for float).

This is based off my fixed point stuff for my virtual machine testing stuff .

struct Fixed16_S //1.1.14{	short val;	__forceinline float FloatVal(void)	{		return (float)(val/16384.0f);	};	operator+=(Fixed16_S &f)	{		val+=f.val;	}	operator-=(Fixed16_S &f)	{		val-=f.val;	}	operator*=(Fixed16_S &f)	{		val = (val>>7)*(f.val>>7);	}	operator/=(Fixed16_S &f)	{		val = (val/f.val)<<14;	}	operator=(const Fixed16_S &f)	{		val = f.val;	}	operator=(const short &v)	{		val=v*16384;  //Set our value to a short!	}	operator=(const float &v)	{		val=(short)(v*16384.0f);	}};

You can now use:
Fixed16_S Test1, Test2;Test1 = 0.5f;Test2 = 0.5f;Test1*= Test2;printf("%f",Test1.FloatVal()); //Should print out 0.25..

Hope this gives you some idea on how fixed point works. This gives pretty good precision and only uses 16-bits, and also preserves the sign properly. This was originally a 32-bit fixed point struct that I just converted to 16, so typographical errors may have popped up. Also, this can easily be changed into an 8-bit format at the loss of some precision.

##### Share on other sites
Yeh, I agree with the fixed point suggestion.

##### Share on other sites
Hmm... Thanks a bunch to all that helped out.

The floats I am concerned about are all the floats for my particles. There will be an undetermined amount of them, but definately alot of them, all with multiple float members.

The CPU will be working with these alot, so from what''s been suggested it sounds like it won''t be a good tradeoff... Oh well, at least now I know and am not nagged by "but what if I could?", heh. Thanks alot guys

##### Share on other sites
quote:
The CPU will be working with these alot.

In this case I don''t think you should use 16 bit because 32 bit Intel/AMD chips are designed to access dword aligned memory (i.e. addresses that are multiples of 4 bytes) more quickly than memory that is not dword aligned.

1. 1
2. 2
Rutin
22
3. 3
JoeJ
20
4. 4
5. 5

• 27
• 40
• 23
• 13
• 13
• ### Forum Statistics

• Total Topics
631735
• Total Posts
3001941
×