C++ floating point error

Started by
10 comments, last by romer 15 years, 4 months ago
I'm currently working on sweep test demo. I have a 20 x 20 box bouncing from left to right going at 100 pixels per frame. After about 20 frames, the program gets stuck in an endless loop. I used a logger and found the culprit function.

void Box::move( float time )
{
    //Before

    xPos += xVel * time;
    yPos += yVel * time;
    
    //After
}

At the //Before line, my logger says my variables are xPos: 40 yPos: 0 xVel: -100 yVel: 0 time: 0.4 At the //After line, my logger says my variables are xPos: -5.96046e-007 yPos: 0 xVel: -100 yVel: 0 time: 0.4 Every time I test it it freezes at this situation. So how can 40 + ( -100 * 0.4 ) equal -5.96046e-007?

Learn to make games with my SDL 2 Tutorials

Advertisement
Floating point precision.

Your "40" could be "40.000000000000000000175138778235" etc, or the 0.4 could be an approximation that's not exactly 0.4.

Never compare a number to an exact number in floating point. You should compare it to a small range of values that accounts for the approximation issue. Usually there's a constant defined very close to zero that is called 'epsilon' that's use for this purpose.
-5.96046e-007 is 0.000000596046, which is basically zero. The floating point numbers aren't always able to be represented precisely, so the value is a little off.
Mike Popoloski | Journal | SlimDX
You said it yourself. Because of floating point error. 40 + ( -100 * 0.4 ) = 0, which is approximately equal to -.000000596046, which is equal to -5.96046e-007. You can't perfectly represent every real number with a float, double, or even a long double. That would require an infinite amount of memory. So what the computer does is it approximates real numbers. Here is a great article on floating point arithmetic for computers which will perfectly explain your problem.

[edit]

ninja'd += 2
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
Quote:Original post by Nypyren
Floating point precision.

Your "40" could be "40.000000000000000000175138778235" etc, or the 0.4 could be an approximation that's not exactly 0.4.


I'd figured cout wouldn't do any rounding of floating points.

Is there any good articles regarding good floating point practices?

Learn to make games with my SDL 2 Tutorials


I would say that your logger is not showing your values with enough decimal accuracy. When your xPos goes negative, set it to 0.0f or a positive value, to make sure it never goes behind the wall.

Then do something like:

xVel = -xVel;

or

xVel = -xVel * 0.99f;

to apply some damping.
Quote:Original post by Lazy Foo
I'd figured cout wouldn't do any rounding of floating points.

Is there any good articles regarding good floating point practices?


If you do cout.setf(ios::fixed), it will force cout to print the number with a fixed number of decimal places. You can specify how many decimal places to print by doing cout.precision(numberOfDecimals). Example:

cout.setf(ios::fixed);
cout.precision(25);
cout << 42.0;

As for your request for articles, see my original post.
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
Quote:Original post by Lazy Foo
Quote:Original post by Nypyren
Floating point precision.

Your "40" could be "40.000000000000000000175138778235" etc, or the 0.4 could be an approximation that's not exactly 0.4.


I'd figured cout wouldn't do any rounding of floating points.

Is there any good articles regarding good floating point practices?


Some fractions, when written, keep going forever. For instance, write 2/3 in decimal. You get 0.6666666 -> (the 6's keep going forever)

I checked the number '0.4' and it turns out to be 0x3ECCCCCD when displayed as hexadecimal. The repeating Cs followed by the D is similar to how 2/3 becomes 0.6666666666->(repeating) or 0.66667 (rounded to 5 places) in decimal.

A quick way to test whether a decimal fraction is representable in a float is to muliply the value by 2 repeatedly until it no longer has a fractional part. If this does not occur by the time your number reaches 2^(number of mantissa bits), then your value will only be an approximation. The interesting part is that if you round your result right before it goes over the limit, this is the exact value in the mantissa of the IEEE representation.
0.4 doesn't have an exact representation in IEEE floating point, so there's going to be some inherent error in general in practically any floating point calculation. If you're getting in an endless loop because of machine error, look how your position variables get used in your code. Without knowing your code, I'd be willing to bet you have some conditional that's doing some exact comparison between the value currently stored in one or both of those variables and some fixed constant. In general, when comparing floating point numbers, you should avoid using straight up == like in this case:

// generally badif( some_float == 0.0f ){    // do stuff}


Instead, you should check to see if your computed value is within some error threshold of the desired value. A quick and dirty method is computing the absolute error like so:

float       error     = fabsf( some_float - 0.0f );const float MAX_ERROR = 1e-6;if( error < MAX_ERROR ){    // some_float is essentially equal to 0, within machine precision}


The above is not completely ideal though. As the magnitudes of your computed values grow, the effective precision you get with a floating point number shrinks, i.e., the difference between two consecutive representable FP numbers grows or in other words your absolute error grows. You can help cope with that by computing relative errors instead.

Also, what you choose for MAX_ERROR really depends on a number of factors like whether you're using single precision or double precision floating point numbers and what range of numbers you're dealing with. In general I've used 1e-6 for floats and 1e-12 or 1e-13 for doubles and generally have good results. Sometimes I have to tweak that based up on the application, but that usually involves more analysis of the inputs and the calculations that go on to try and justify it. I'm sure there are better ways of picking error thresholds, but that's just how I've done it.

I've used this site before in coming up with some ways of doing 'better' comparisons, but as the site says, doing any sort of comparisons with floating point numbers is always hairy.

Hope that helps.
Quote:Original post by romer
Instead, you should check to see if your computed value is within some error threshold of the desired value. A quick and dirty method is computing the absolute error like so:

*** Source Snippet Removed ***

While your logic is correct you shouldn't use fabsf, since it isn't a standard C++ function. In C++ you should instead use fabs from cmath, which is defined for both float and double.

This topic is closed to new replies.

Advertisement