# Floating point Operations

This topic is 4499 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi. What I am trying to do(as part of something im working on) is determine the accuracy of floating-point calculations. That is, I want to be able to know if a+b gives an exact answer, a*b, etc. I THINK I have managed to do it for Addition and Multiplication, but I have no idea about Division. Does anyone know how? Or maybe its not possible? EDIT: Maybe its possible to say that (a/b)*b will be equal to a only if a/b gives an exact answer?

##### Share on other sites
"...I want to be able to know if a+b gives an exact answer,..."
Short answer, it doesn't. There is nothing exact about floats or doubles when you consider it's decimal places in binary.

##### Share on other sites
Maybe I missed the point, but how do you mean?
"...determine the accuracy of floating-point calculations...if a+b gives an exact answer..."

##### Share on other sites
The IEEE standard for floating point arithmetic might help you. Floating point numbers aren't mathematical reals, and so you can't assume they will obey mathematical laws.

(x + y) + z != x + (y + z)
(x * y) * z != x * (y * z)
x * (y + z) != (x * y) + (x * z)

Some numbers cannot be represented. NaNs (Not a Number) also have this interesting property:

x != x

So don't rely on their accuracy. If you need accurate numbers, use a library that gives you arbitrary precision real numbers.

##### Share on other sites
Ok, I'm curious, how did you do it for addition/multiplication?

The only way I can think of to test this would be to work through each digit, and perform the operations manually, storing the result into a fixed-point format that can handle the precision. Then check if the result matches manually, comparing each digit.

Needless to say, if you go that route, it'd be a lot easier to just avoid floats entirely.

##### Share on other sites
Addition is exact if no 1-bit is rightshifted out of the mantissa.

For example that holds for all integers < 23 bit. 1.0f+2424.0f==2425.0f

##### Share on other sites
Quote:
 Original post by Anonymous Poster"...I want to be able to know if a+b gives an exact answer,..."Short answer, it doesn't. There is nothing exact about floats or doubles when you consider it's decimal places in binary.

Floats and doubles are perfectly exact, its the operations on them that are inexact.

I'm aware that lots of numbers are un-representable. But I still don't know how to do Division. I think I did Addition much as suggested. Try not to laugh, but I THINK multiplication is exact if the resulting number of bits is less than 52(double). and the number of resulting number of bits = bits in first * bits in second?

##### Share on other sites
Can I ask why do you want to do this? And what exactly you are trying to do?

##### Share on other sites
Good question. I am attempting to write a very basic computer algebra system(and after a long time am finally getting somewhere!). Basically I don't want to lose any information. That means that sqrt(2) stays sqrt(2), but sqrt(4) = 2, and 1/3 stays 1/3, but 4/2 = 0.5. I know that doing this with roots/powers will be impossible, but division is the last of the basic operations i have yet to work out.

##### Share on other sites
Then I really recommend you use a real number library and stop using hardware precision floats. Prefer correct arithmetic to fast arithmetic.

An excellent example of something like what you are trying to do is Frink. Try out the web applet. Precision is kept throughout calculations.

1. 1
Rutin
68
2. 2
3. 3
4. 4
5. 5

• 11
• 10
• 21
• 10
• 33
• ### Forum Statistics

• Total Topics
633438
• Total Posts
3011881
• ### Who's Online (See full list)

There are no registered users currently online

×