Floating-Point Precision

Started by
11 comments, last by Tom Backton 14 years, 3 months ago
It's just a little question related to the round-off errors that result from multiplication and division of floating-point variables. I tried a simple calculation (with a scientific calculator, not with a computer, but I don't think it matters. Right?) and the result of b*c/d wasn't same as b/d*c. The last digit was different by 1. For a=b*c/d , what I usually see in code examples is a=b*c/d; , not a=b/d*c;. In longer calculations too, usually division is the last operation to take place (for example, a=b*c*d/(e*f);). Is there a difference in the round-off error between multiplication and division? Is multiplication done first so that the effect of the error on the next calculation is smaller? In my C++ code, should I prefer placing the division after the multiplication of the average error is mathematically exactly the same in all cases?
Advertisement
Are all of your data types the same? Are you aware of integer division?

If there is any difference in rounding, that would be platform-specific, possibly optimization specific.
The ins and outs of maintaining floating point accuracy can actually get rather intricate. A full answer to your query would probably take many pages, unfortunately.

I rather liked the chapter in Christer Ericson's book "Real-Time Collision Detection" on the subject.

As to whether or not multiplication or division occurs first, that is simply down to operator precedence. In C++, multiplication has higher precedence than division.
You'll probably want to look into the common standard for floating point math if you want to get a real grasp on some of the problems you're having (i.e. that floating point math IS NOT commutative or distributive):

IEEE-754
I think the standard reasons to do divides last in code is:

- It works well for integer maths (unless it overflows). For example (3*4)/4 is not equal to (3/4)*4.

- Divides are slow, therefore doing them last can mean you need to do less of them (e.g. ((a/c) + (b/c)) will get a similar answer to ((a+b)/c) for floats). Also the slowness is potentially more easily hidden by the compiler / processor if done last.
Unless you're dealing with code specifically designed to counteract accumulation error, the order of the division is a matter of personal preference. Personally I think a*b/c is more clear than a/c*b. If you wrote the latter on paper, it mightn't be apparent if the 'b' was part of the denominator or not.
First of all, I'm aware of how integer division works, but I'm asking about floating-point division.

About operator precedence: actually, in C++ multiplication and division have the same precedence (like in math, so it makes sense). http://cplusplus.com/doc/tutorial/operators/

I'm aware of the fact dividing is slow, but I'm asking about cases in which the order doesn't change the number of divisions and multiplications, so the only question is whether the order affects precision.

"Unless you're dealing with code specifically designed to counteract accumulation error" - I am. In physics simulations it's better to have as much precision possible without losing speed. So for floating-point numbers a, b and c, is a*b/c more precise than a/c*b ? It's important because if there's no difference I'll prefer the second option (a*b could overflow, so I'd need to use double instead of float - "waste" of memory), otherwise I'll prefer the first one when precision is important.
Quote:Original post by Tom Backton

I am. In physics simulations it's better to have as much precision possible without losing speed.


Don't physicists use Python these days?

Otherwise, it depends depending on absolute size of numbers involved.

You will need to evaluate your algorithms, and analyze them on individual basis to minimize the error.

But this is moot point - if accuracy is more important, there exist precise libraries with arbitrary precision and rational representation which eliminate the error altogether. Performance is moot point in this case - just throw more hardware at it.
Quote:Original post by Antheus
Quote:Original post by Tom Backton

I am. In physics simulations it's better to have as much precision possible without losing speed.


Don't physicists use Python these days?

Otherwise, it depends depending on absolute size of numbers involved.

You will need to evaluate your algorithms, and analyze them on individual basis to minimize the error.

But this is moot point - if accuracy is more important, there exist precise libraries with arbitrary precision and rational representation which eliminate the error altogether. Performance is moot point in this case - just throw more hardware at it.


I don't know what sort of work Tom is doing, but I have some experience with molecular dynamics simulations in metals while most of my research group did phase field work. I'd say C is the most common language I saw, often using libraries written in Fortran. Python or similar were mostly used for post processing.

I don't know if it helps, but I didn't know anyone concerned with these sorts of optimizations, whether for speed or precision.

For speed, it was all about algorithms: can you get O(N)? Is there an algorithm that converges in fewer steps? If it's still too slow, try to parallelize it (i.e. throw more hardware at it). If it's still too slow, it's simply not possible with the current state of math/science/technology. Nobody cares if you can shave a few hours off simulations that last days. If a*b/c is faster than a/c*b, let the compiler worry about that.

For precision, you just used doubles and called it "good enough" (nobody used floats, if doubles were too big you got more RAM). The reason is that error due to double precision is usually much less than other sources. Like any other optimization, look for your bottleneck. In my case, thermal fluctuations were orders of magnitude larger than errors in the potential table which, in turn, were orders of magnitude larger than errors due to double precision. A potential table may be good to, say, 1e-6 eV while fluctuations at room temperature are on the order of 1e-2 eV. The machine epsilon for doubles is about 1e-16. Only once did floating point precision become an issue for me (I had a difference between large exponentials).

Like I said, I don't know what sort of work Tom's doing, but my suggestion would be to write code that looks as close as possible to how you'd write the formula on paper. It's very frustrating to wait a week and get a useless result because you typed the formula incorrectly. Use -ffast-math for gcc (similar for other compilers) to allow the compiler to reorder your calculations.
Since it's not been linked in this thread I should probably point out this document, which covers in detail what precision you can expect from floating point calculations. For calculations that only involve multiplies and divides precision is high regardless of ordering, so I'd suggest ordering them to avoid overflow and underflow. It's adds and subtracts that you have to be more careful with to maintain high precision.

The precision claims in there are only accurate if you don't tell the compiler to mess with your floating point code though (i.e. no -ffast-math or equivalent).

This topic is closed to new replies.

Advertisement