Sign in to follow this  
Tom Backton

Floating-Point Precision

Recommended Posts

It's just a little question related to the round-off errors that result from multiplication and division of floating-point variables. I tried a simple calculation (with a scientific calculator, not with a computer, but I don't think it matters. Right?) and the result of b*c/d wasn't same as b/d*c. The last digit was different by 1. For a=b*c/d , what I usually see in code examples is a=b*c/d; , not a=b/d*c;. In longer calculations too, usually division is the last operation to take place (for example, a=b*c*d/(e*f);). Is there a difference in the round-off error between multiplication and division? Is multiplication done first so that the effect of the error on the next calculation is smaller? In my C++ code, should I prefer placing the division after the multiplication of the average error is mathematically exactly the same in all cases?

Share this post


Link to post
Share on other sites
The ins and outs of maintaining floating point accuracy can actually get rather intricate. A full answer to your query would probably take many pages, unfortunately.

I rather liked the chapter in Christer Ericson's book "Real-Time Collision Detection" on the subject.

As to whether or not multiplication or division occurs first, that is simply down to operator precedence. In C++, multiplication has higher precedence than division.

Share this post


Link to post
Share on other sites
You'll probably want to look into the common standard for floating point math if you want to get a real grasp on some of the problems you're having (i.e. that floating point math IS NOT commutative or distributive):

IEEE-754

Share this post


Link to post
Share on other sites
I think the standard reasons to do divides last in code is:

- It works well for integer maths (unless it overflows). For example (3*4)/4 is not equal to (3/4)*4.

- Divides are slow, therefore doing them last can mean you need to do less of them (e.g. ((a/c) + (b/c)) will get a similar answer to ((a+b)/c) for floats). Also the slowness is potentially more easily hidden by the compiler / processor if done last.

Share this post


Link to post
Share on other sites
Unless you're dealing with code specifically designed to counteract accumulation error, the order of the division is a matter of personal preference. Personally I think a*b/c is more clear than a/c*b. If you wrote the latter on paper, it mightn't be apparent if the 'b' was part of the denominator or not.

Share this post


Link to post
Share on other sites
First of all, I'm aware of how integer division works, but I'm asking about floating-point division.

About operator precedence: actually, in C++ multiplication and division have the same precedence (like in math, so it makes sense). http://cplusplus.com/doc/tutorial/operators/

I'm aware of the fact dividing is slow, but I'm asking about cases in which the order doesn't change the number of divisions and multiplications, so the only question is whether the order affects precision.

"Unless you're dealing with code specifically designed to counteract accumulation error" - I am. In physics simulations it's better to have as much precision possible without losing speed. So for floating-point numbers a, b and c, is a*b/c more precise than a/c*b ? It's important because if there's no difference I'll prefer the second option (a*b could overflow, so I'd need to use double instead of float - "waste" of memory), otherwise I'll prefer the first one when precision is important.

Share this post


Link to post
Share on other sites
Quote:
Original post by Tom Backton

I am. In physics simulations it's better to have as much precision possible without losing speed.


Don't physicists use Python these days?

Otherwise, it depends depending on absolute size of numbers involved.

You will need to evaluate your algorithms, and analyze them on individual basis to minimize the error.

But this is moot point - if accuracy is more important, there exist precise libraries with arbitrary precision and rational representation which eliminate the error altogether. Performance is moot point in this case - just throw more hardware at it.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Quote:
Original post by Tom Backton

I am. In physics simulations it's better to have as much precision possible without losing speed.


Don't physicists use Python these days?

Otherwise, it depends depending on absolute size of numbers involved.

You will need to evaluate your algorithms, and analyze them on individual basis to minimize the error.

But this is moot point - if accuracy is more important, there exist precise libraries with arbitrary precision and rational representation which eliminate the error altogether. Performance is moot point in this case - just throw more hardware at it.


I don't know what sort of work Tom is doing, but I have some experience with molecular dynamics simulations in metals while most of my research group did phase field work. I'd say C is the most common language I saw, often using libraries written in Fortran. Python or similar were mostly used for post processing.

I don't know if it helps, but I didn't know anyone concerned with these sorts of optimizations, whether for speed or precision.

For speed, it was all about algorithms: can you get O(N)? Is there an algorithm that converges in fewer steps? If it's still too slow, try to parallelize it (i.e. throw more hardware at it). If it's still too slow, it's simply not possible with the current state of math/science/technology. Nobody cares if you can shave a few hours off simulations that last days. If a*b/c is faster than a/c*b, let the compiler worry about that.

For precision, you just used doubles and called it "good enough" (nobody used floats, if doubles were too big you got more RAM). The reason is that error due to double precision is usually much less than other sources. Like any other optimization, look for your bottleneck. In my case, thermal fluctuations were orders of magnitude larger than errors in the potential table which, in turn, were orders of magnitude larger than errors due to double precision. A potential table may be good to, say, 1e-6 eV while fluctuations at room temperature are on the order of 1e-2 eV. The machine epsilon for doubles is about 1e-16. Only once did floating point precision become an issue for me (I had a difference between large exponentials).

Like I said, I don't know what sort of work Tom's doing, but my suggestion would be to write code that looks as close as possible to how you'd write the formula on paper. It's very frustrating to wait a week and get a useless result because you typed the formula incorrectly. Use -ffast-math for gcc (similar for other compilers) to allow the compiler to reorder your calculations.

Share this post


Link to post
Share on other sites
Since it's not been linked in this thread I should probably point out this document, which covers in detail what precision you can expect from floating point calculations. For calculations that only involve multiplies and divides precision is high regardless of ordering, so I'd suggest ordering them to avoid overflow and underflow. It's adds and subtracts that you have to be more careful with to maintain high precision.

The precision claims in there are only accurate if you don't tell the compiler to mess with your floating point code though (i.e. no -ffast-math or equivalent).

Share this post


Link to post
Share on other sites
Quote:
- I am. In physics simulations it's better to have as much precision possible without losing speed. So for floating-point numbers a, b and c, is a*b/c more precise than a/c*b ? It's important because if there's no difference I'll prefer the second option (a*b could overflow, so I'd need to use double instead of float - "waste" of memory), otherwise I'll prefer the first one when precision is important.


Read this:
http://www.ddj.com/cpp/184403224
And this, for a very effective way to counteract accumulation error:
http://en.wikipedia.org/wiki/Kahan_summation_algorithm

I don't think any particular placement of the division sign can guarantee better precision for all possible values. Ordering to avoid overflow makes sense though. But this depends on the values of a,b,c. If 'c' is close to zero, then the division produces a larger number, so the second option could overflow too.
However you'll probably find that if you're using floats, then neither of those expressions will overflow, as the compiler secretly carries out intermediate calculations in double precision.

I don't think any arrangement is "better" from a mathematical standpoint. If you think you can reduce the chance of overflow, then arrange accordingly, but the optimal ordering depends entirely upon the values of the variables.

Share this post


Link to post
Share on other sites
Case point: in an application I'm working on, one programmer had rewritten an expression as (a * b * c) / (d * e * f) in an attempt to increase accuracy. The problem was that in some cases all numbers involved were very small, and both the numerator and denominator became denormalized which resulted in NaN's or just rubbish.

However the numbers came in pairs, which are of roughly equal magnitude. So, we rewrote the expression as (a / d) * (b / e) * (c / f). Even though a pair might be very small, each division resulted in roughly 1, and so the final expression always behaves nicely.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this