C++ floating point error

Started by
10 comments, last by romer 15 years, 5 months ago
Quote:Original post by Nypyren
Your "40" could be "40.000000000000000000175138778235" etc

No. Integer values remain integer values in floats. They may not be exact (for example 123456789F is really 123456792F), but they are integer.

Quote:Original post by Nypyren
or the 0.4 could be an approximation that's not exactly 0.4.

That's correct, the literal 0.4 represents the value 0.4000000059604644775390625.

Quote:Original post by Nypyren
A quick way to test whether a decimal fraction is representable in a float is to muliply the value by 2 repeatedly until it no longer has a fractional part. If this does not occur by the time your number reaches 2^(number of mantissa bits), then your value will only be an approximation.

Good observation. You may already stop if the last digit isn't a 5, because every other digit will result in an infinite cycle for that position:

1 -> 2 -> 4 -> 8 -> 6 -> 2
2 -> 4 -> 8 -> 6 -> 2
3 -> 6 -> 2 -> 4 -> 8 -> 6
4 -> 8 -> 6 -> 2 -> 4
5 -> 0 (and thus the digit disappears)
6 -> 2 -> 4 -> 8 -> 6
7 -> 4 -> 8 -> 6 -> 2 -> 4
8 -> 6 -> 2 -> 4 -> 8
9 -> 8 -> 6 -> 2 -> 4 -> 8

Interestingly, each of these cycles has length 4, and that is reflected in cycles of length 4 in the mantissa. For example, 0.4 has the mantissa
1.10011001100110011001101

The last bit breaks the pattern because of the rounding issues already mentioned. Since we have no 24th bit after the binary point, the relative error introduced by rounding is approximately 2^-24 or 6 * 10^-8. That's why we say a float has 7 to 8 significant digits, because the digits after that just reflect the rounding error:
0.4000000059604644775390625

[Edited by - DevFred on November 23, 2008 5:25:55 AM]
Advertisement
Quote:Original post by Perost
Quote:Original post by romer
Instead, you should check to see if your computed value is within some error threshold of the desired value. A quick and dirty method is computing the absolute error like so:

*** Source Snippet Removed ***

While your logic is correct you shouldn't use fabsf, since it isn't a standard C++ function. In C++ you should instead use fabs from cmath, which is defined for both float and double.


Fair enough, given that the OP was asking about C++. Where I work we actually code strictly in C, and fabsf() *is* a standard C99 function. I don't know about most people, but for me sometimes keeping straight what's all standard C++ and C99 (and sometimes C89 for that matter) gets a bit muddied, especially when talking about the standard math functions.

This topic is closed to new replies.

Advertisement