Sign in to follow this  
Nauraushaun

C++'s floats not very good?

Recommended Posts

I've found that C++'s float values seem to alter themselves. For example, I set a float variable to 0.795, and I check it while the program is running to find that it's now 0.795000000129748. Is there some reason why this is happening? Does it always happen?
It's very annoying, and it makes equality impossible. Instead of:

if(floatVar == 0.795)


You have to use the infinitely more complex:

if((floatVar >= 0.795) && (floatVar < 0.910))

Where 0.910 stands for some other value.

Share this post


Link to post
Share on other sites
Not all real values can be accurately represented in a fixed-space binary encoding on a computer -- thus, all floats have these inherent problems. You should almost never use pure equality to compare floats, instead preferring epsilon-based range checks. Read this.

Share this post


Link to post
Share on other sites
I'm sure the links posted already provide all the gory details, but to summarize, what you are seeing is not a bug -- its a feature of floating point numbers.

Take a 32bit integer -- it can hold a number between around -2 billion to around +2 billion signed, or 0 to around 4 billion unsigned. With a 32bit number, you get 2^32 (or 2^31 with signedness) "increments" which are evenly distributed.

With floating point, the format was designed to solve the problem of representing both *really* small numbers and *really* big numbers with one format. a 32-bit floating point value can represent a number as large as ~ 3.4028234 × 10^38 -- thats a number about 29 orders of magnitude larger than an Integer value containing the same number of bits. At the same time, it can represent a number as small as 1.17549435 × 10-38 -- to translate to decimal, that's:

0.0000000000000000000000000000000000000117549435

If you encode that number using a fixed-point format, you'd need 128 bits and you'd have a range of 0 to just shy of 4. This is why floating point is cool, if a bit strange.

How does floating point get so much range out of 32 bits? Well, it gives up on having a linear distribution of "increments". The key observation that's behind floating point is that really big numbers don't need to be nearly as accurate to be "in the ballpark", and conversely, really small numbers have to be very, very close to their real value to be in the same (realative) ballpark. So that's what they designed it around -- the difference between the smallest representable number and the next smallest is super small, but at the other end of the scale, the difference between the largest possible number and the next largest is comparatively massive. I'm not sure of the exact difference, but I'd wager its 10s of orders of magnitude larger than the difference between the smallest representable numbers.

In scientific applications, things that are small tend to be *really small* -- say, the distance between atoms in some material; things that are large tend to be *really large* -- say, the distance between solar systems. So if you're planning a deep space mission and your calculations are 1000KM off, the error is basically nothing, but if you're mapping the structure of steel or something, and your calculations are off by even one-one-thousandth of a micrometer, then you've not even close.

Also, if your calculations mix numbers from opposite ends of this spectrum, you'll shed accuracy like mad, so do your best to structure your calculations such that the numbers being operated on are of a similar magnitude where possible.

Basically, you just have to deal with it. The "fuzzyness" of floating point equality is called an "Epsilon" value. For single-precision floats an epsilon value of 0.000001 is, I believe, fairly standard.

Share this post


Link to post
Share on other sites
Quote:

if(floatVar == 0.795)
You have to use the infinitely more complex:
if((floatVar >= 0.795f) && (floatVar < 0.910))

Faster, and less arbitrary:
float temp(floatVar - 0.795f);
if (temp * temp < std::numeric_limits<float>::min()) ...
Depending on the mathematical operations used to reach the value of the variables, you might need to compare to numeric_limits<float>::epsilon() or a multiple thereof instead.

I would very strongly recommend you read the first chapter of Modern Mathematical Methods for Physicists and Engineers

I swear to gawd that a computer architecture course should be an absolute requirement before any programming course in an accredited CS program.

Share this post


Link to post
Share on other sites
Quote:
Original post by Prune
I swear to gawd that a computer architecture course should be an absolute requirement before any programming course in an accredited CS program.

In my experience of taking a computer architecture course floats aren't covered. It's more about memory, caches, pages, and pipelines. Floating point representation was covered in the assembly class. :\ Just goes to show that not every class covers the same stuff at every university.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sirisian
Quote:
Original post by Prune
I swear to gawd that a computer architecture course should be an absolute requirement before any programming course in an accredited CS program.

In my experience of taking a computer architecture course floats aren't covered. It's more about memory, caches, pages, and pipelines. Floating point representation was covered in the assembly class. :\ Just goes to show that not every class covers the same stuff at every university.


yes for me, I'm just finishing up computer architecture and we covered floating point numbers. From its internal representation to algorithms for truncatination and so on.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sirisian
Quote:
Original post by Prune
I swear to gawd that a computer architecture course should be an absolute requirement before any programming course in an accredited CS program.

In my experience of taking a computer architecture course floats aren't covered.


Bizarrely, at my university, in my program, I found out that this was covered (at least to some extent) in the "basic" stream of the first-year CSC course, but not in the "advanced" stream (which I was in). o_O Of course, it was more in the form of random facts that they had to memorize for the exam, and relatively little to do with actually writing software; I don't think any of the assignments for either version of the course really required floating-point calculations in any meaningful way... my memory of this is really foggy, though.

Share this post


Link to post
Share on other sites
To add to the collection of useful references on the topic...

I also would highly recommend Chapter 11, "Numerical Robustness" of Christer Ericson's Real-Time Collision Detection for a brief, but very useful and practical look at the topic. The chapter is only 40 pages and is still very effective (which is impressively concise given the potential size of the topic).

(Here's a link to the book on amazon: http://www.amazon.com/Real-Time-Collision-Detection-Interactive-Technology/dp/1558607323)

Share this post


Link to post
Share on other sites
Quote:
Original post by Ravyne
[...] I'm not sure of the exact difference, but I'd wager its 10s of orders of magnitude larger than the difference between the smallest representable numbers. [...]


If you want a showpiece for that, assign 20,000,000 (20 million) to a (32 bit) floating point variable.

The next larger number the floating point variable can assume is 20,000,002 - that's 2 full digits, not decimal places.

This "smallest possible increment" has been dubbed "Unit in the Last Place", or "ULP" for short (Wikipedia has an article on it). So while 20,000,000 + 1 ULP = 20,000,002 but 1.0 + 1 ULP = 1.00000012. There's some code floating around which can tell you how many ULPs two floating point numbers are apart, which is a good way to compare floating points without using fixed epsilon values (no idea how it compares to Prune's method or if its results would be identical give the right epsilon value).

Share this post


Link to post
Share on other sites
Quote:
Original post by Prune
Quote:

if(floatVar == 0.795)
You have to use the infinitely more complex:
if((floatVar >= 0.795f) && (floatVar < 0.910))

Faster, and less arbitrary:
float temp(floatVar - 0.795f);
if (temp * temp < std::numeric_limits<float>::min())


A multiplication (with dependency chain) is not faster than a float compare.

Quote:
I swear to gawd that a computer architecture course should be an absolute requirement before any programming course in an accredited CS program


Quite.

Using your method....


float a = 1.000000001e28f;
float b = 1.000000000e28f;
float temp = a - b;
if( (temp * temp) < 1.175494351e-38f /* aka FLT_MIN */ )
{
/* How often will you get here? */
}



The == operator does the same thing as that code, but cheaper......

Quote:
Depending on the mathematical operations used to reach the value of the variables, you might need to compare to numeric_limits<float>::epsilon() or a multiple thereof instead.


Either an *extremely* large multiple, or find a better solution.... (of which there are many).

Share this post


Link to post
Share on other sites
Oh woah, okay.
I had a subject at University about internal computer systems and all that, floating point wasn't covered in any depth. Nor was it covered in any of my programming subjects, which is annoying.
Would I be better of using something like short? Or are they performed in a similar manner, just with less bits?

Share this post


Link to post
Share on other sites
Quote:
Original post by Nauraushaun
Would I be better of using something like short? Or are they performed in a similar manner, just with less bits?


short is an integer type. You can represent all integers (up to 2^bitsize of the type) with any integer type.

If you want real-type numbers x.xxxxx or whatever, you need to use float or double and do epsilon comparisons. It's really not a big deal, the only real sacrifice you have to use is making those comparisons if ( x >= number-epsilon && x <= number+epsilon) instead of if ( x == number )

-me

Share this post


Link to post
Share on other sites
Quote:
Original post by Nauraushaun
Would I be better of using something like short? Or are they performed in a similar manner, just with less bits?
Better off in what way? You should use the data type that is most appropriate for what you want to use it for.

If you need fractional data your options are float/double, aribtrary precision arithmetic library, or fixed-point arithmetic. If you don't need fractional data use one of the existing built-in integer types of which short is a possible choice.

Share this post


Link to post
Share on other sites
You can also do a branching compare though it is likely to be slower than what has already been proposed.


inline bool compareFloats (float a, float b)
{
return (abs(a - b) < EPSILON);
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Ameise
You can also do a branching compare though it is likely to be slower than what has already been proposed.


inline bool compareFloats (float a, float b)
{
return (abs(a - b) < EPSILON);
}


Erm, what's been proposed is branching comparisons, as far as I can tell. What suggestions are you referring to that don't involve a branch?


(BTW this is a trick question. All comparisons in this sense (i.e. executing code depending on a value) are branches [wink])

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Quote:
Original post by Ameise
You can also do a branching compare though it is likely to be slower than what has already been proposed.


inline bool compareFloats (float a, float b)
{
return (abs(a - b) < EPSILON);
}


Erm, what's been proposed is branching comparisons, as far as I can tell. What suggestions are you referring to that don't involve a branch?


(BTW this is a trick question. All comparisons in this sense (i.e. executing code depending on a value) are branches [wink])


Apologies, I wrote this late at night whilst getting out of work. :D

They differ in mechanism to actually compare the two floats - everyone else's methods rely on a range-test between two floats while mine relies on the absolute value of the difference - depending on how abs is implemented, I would imagine that mine involves two branches.

Share this post


Link to post
Share on other sites
Quote:
They differ in mechanism to actually compare the two floats - everyone else's methods rely on a range-test between two floats while mine relies on the absolute value of the difference - depending on how abs is implemented, I would imagine that mine involves two branches.

I would guess abs is implemented using a bitwise operation, I think that would make your solution actually faster.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mussi
Quote:
They differ in mechanism to actually compare the two floats - everyone else's methods rely on a range-test between two floats while mine relies on the absolute value of the difference - depending on how abs is implemented, I would imagine that mine involves two branches.

I would guess abs is implemented using a bitwise operation, I think that would make your solution actually faster.


I was thinking that to, that it may use twiddling -- however in my own benchmarks (on modern CPUs), I found that using the branching abs


inline float abs (float val)
{
return ((val < 0) ? -val : val)
}


was often faster than the twiddled abs


inline float abs (float val)
{
int32_t cast = *(int32_t *)&val;
cast &= 0x7FFFFFFF;
return *(int32_t *)&cast;
}

Share this post


Link to post
Share on other sites
... and this is why you shouldn't try to apply intuition to optimization. "Ooooh, I can save a branch." Except that going between float <-> int (to do that bit twiddling) is MUCH slower than the branch. On many architectures, that requires storing the value to RAM, then re-reading it back into an integer register, which also triggers what's called a Load-Hit-Store (LHS). Then you get to spend ~100 cycles doing nothing while the caches figure out what you just did. Short version: don't worry about micro-optimizing that until you know it's a problem, and even then, test any change you're trying (in real-world usage, not with artificial benchmarks).

Share this post


Link to post
Share on other sites
Quote:
Original post by osmanb
... and this is why you shouldn't try to apply intuition to optimization. "Ooooh, I can save a branch." Except that going between float <-> int (to do that bit twiddling) is MUCH slower than the branch. On many architectures, that requires storing the value to RAM, then re-reading it back into an integer register, which also triggers what's called a Load-Hit-Store (LHS). Then you get to spend ~100 cycles doing nothing while the caches figure out what you just did. Short version: don't worry about micro-optimizing that until you know it's a problem, and even then, test any change you're trying (in real-world usage, not with artificial benchmarks).


Yes, hence why I already stated in here that the branched version is slower (although I ran the benchmarks a rather long time ago); adding correction onto something that needs no correction doesn't solve anything.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ameise
I was thinking that to, that it may use twiddling -- however in my own benchmarks (on modern CPUs), I found that using the branching abs [...] was often faster than the twiddled abs
Isn't fabs implemented as a FPU instruction? In which case the compiler can probably convert your abs function to a single FPU instruction instead of having to do a bunch of moves to and from floating point and general purpose registers.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this