# Floating precision

This topic is 4816 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey, a really simple question. How does a float variable mostly lose precision? Is it because the digits are too far from the decimal? For example, would I lose more precision multiplying numbers such as 0.000001782 than I would multiplying numbers like 1.782? If so, and one had the need to perform a large number of calculations on a value on start-up, would it help to multiply a value which is known to be small before performing calculations? For example, my animations use a motion vector which is divided by the number of milliseconds between keyframes. When you divide something by milliseconds, it's obviously going to be pretty small. Then add the fact that I blend up 3 or 4 motion vectors together, factoring them out, adding them together, and multiplying them all by time, - it sounds pretty risky, no? I know I sound like a moron, but I've never completely understood how floating precision works, or why it loses so much data. Why is it that DirectX + graphics cards use floats for it's matrices and other math gadgets? Wouldn't the use of doubles allow more accuracy? Come on, what's an extra 32-bits per float [wink] Well, thanks for advice.

##### Share on other sites
IEEE 754 floating point numbers are represented using mantissa + exponent. As their very name indicates, the position of the decimal (or binary) point doesn't matter (it floats). The limitation lies in how many (binary) digits you can have past the first non-zero digit.

So, for example, assuming FLT_EPSILON = 1.19209290e-7F
   10000000.0f + 2.0f =  10000002.0f  100000000.0f + 2.0f = 100000000.0f

Because, in the second case, the 2 is too far away to the right of the 1 to register. And since they are floating point, the situation would be the same, with, say
   10000.0f + 0.002f =  10002.0f  100000.0f + 0.002f = 100000.0f

And yes, you do risk numerical drift.

##### Share on other sites
How much floating point numbers lose accuracy is proportionate to how many operations you perform on them, so forget you pre-multiplying ideas, for the most part it can only make it worse.

Do you know how to represent real numbers in binary?
i.e. 9.375 = 1001.011b (8 + 1 + 0.25 + 0.125)
The most loss comes from trying to do things like multiply by one-third which can't be exactly represented in binary (or even decimal for that matter).
Whereas multiplying by 3.0f which can be exactly represented is mostly going to not lose you any accuracy.

Operating on two numbers which are close in magnitude is one other thing which will help keep accuracy (you wont lose so many bits of the mantissa) Don't bend over backwards to do this though.

Many games use floats because the loss in accuracy is an acceptable trade-off. I have never found the accuracy difference a problem myself. Using less memory can take priority.

##### Share on other sites
Subtracting two equal magnitude numbers, to get a very small difference, often leaves you with just a few bits worth of real precision. It's one of the most common sources of precision loss; for example, calculate the distance between the earth and the moon when the sun is the center of the coordinate system: distance = ||earthPos-moonPos|| -- you just subtracted two almost-equal, large numbers, to generate a much smaller number, losing you a LOT of potential precision.

As a rule of thumb, there's about 3.5 bits per decimal digit of precision. So a 7-digit mantissa requires about 25 bits of precision, and floating-point numbers only have 24 (23 mantissa + leading implicit 1). Thus, for 7-digit numbers to be fully accurate, you'd need a double (53 bits precsision).

##### Share on other sites
Thanks very much for the guidance.

Unfortunately, I can't imagine how I could achieve the same effect as dividing by time between frames, then multiplying by the time passed. I keep no records of the last time I animated something, only having the amount of time that has passed since I last animated it.

I guess I should probably at least cast to double to do the time relative division calculations on startup.

Anyone have a clue how much penalty would be inflicted by performing a lot of math between floats and doubles at run time? For example, as in me using doubles for my math, but then handing the final calculations to DirectX matrix floats? Would this be pointless? Or just not worth it?

Also, if my object's world locations are represented with floats, does this mean that the farther they get from the world origin, the choppier their movement will be? If at the 100000000.0 magnitude, a value of 1.0 (1 inch in my world) being ignored is a biggie. Actually, that would most likely cause them to stop moving completely. Hmmm. Guess that would rule out massive gigantic worlds, eh?

Thanks again [smile]