Archived

This topic is now archived and is closed to further replies.

Spanky

Floating point problems

Recommended Posts

Hey guys, I''m currently using floating point variables with very unusable results. The following code float delta = 0.1f * 1.0f; float origValue = 200.0f; float newValue = origValue + delta; gives newValue a value of 200.10001 or something (not sure how many 0''s and don''t want to count) whe rounded. If it doesn''t round it, I get 200.1000000055361. The last numbers (55361) change very slightly each time I add delta to it. I need this not to happen because I am increasing a text control as the user slides the mouse and want it to go in 0.1 increments but this looks horrible. Any ideas? Shawn

Share this post


Link to post
Share on other sites
1) Floating point isn''t "exact". The result of any floating point computation is even less "exact" because the binary point needs to be aligned (i.e. the exponents need to be of the same magintude).

2) The FPU works in between 64 and 128 bit precision internally - when you load a float of say 0.1f, then that''s only 32bits, the rest of the 64-128 are undefined (AFAIK)

3) Display of "float" values in the debugger, with printf or any other formatted routines (any %f for example) is done as doubles - in those cases, what the routine is displaying for the lower digits is actually part of the 64-128 part of the FP register. You''re displaying too many digits! Control the number of digits displayed with the precision part of the format specifier (%.4f - displays 4 digits after the decimal point for example).

Share this post


Link to post
Share on other sites
quote:
Original post by Miserable
You might find this thread educational; S1CA''s post especially. In brief, however: Floating point math is never entirely precise, so don''t expect it to be.

... Lots of these threads popping up these last few days.


beaten by my own post - thanks!

Share this post


Link to post
Share on other sites
I recommend that you store values that are not allowed to drift as integers, for example times as ms instead of seconds. Or use another fixed point representation, that always can represent your results without rounding or with acceptable rounding.

Share this post


Link to post
Share on other sites
Hey,

Thanks for the quick replies.

I am not doing anything at all very odd. I mean, I should be able to multiply 0.1 * 1.0 and get a normal result shouldn''t I? And adding that to 200.0 shouldn''t be that big of a deal should it?

I mean, I am explicitly declaring these just as 0.1f and 1.0. I would figure it could handle something that simple. It seems like more work to do something so simple.

Shawn

Share this post


Link to post
Share on other sites
have you read the above posts? looks like you haven''t..

inform yourself about floatingpoint..

one information for you: 0.1 is not exactly representable in binary format. this is the main reason it gets so fast so imprecious.

use 1/16 as steps, and you don''t encounter such problems.

or bether, just use fixedpoint.




If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net

Share this post


Link to post
Share on other sites
You are getting normal results, if you use floats or doubles you have to live with unstability and drift. The question is do they need to be fully stable and accurate all the time, more often than not, they don''t need to. And if they do, you need fixed points as I suggested in my previous post.

Share this post


Link to post
Share on other sites
BTW to learn more and get the complete answer, check out the classic "What Every Computer Scientist Should Know About Floating-Point Arithmetic":

http://docs.sun.com/source/806-3568/ncg_goldberg.html

It explains the issues with IEE754 floating point in full detail: why it isn''t "exact", what happens with arithmetic operations, rounding errors etc.

I''d definately recommend reading that at least once, even if you thought you knew how floating point works!.

Basically we''re talking about why you can''t fit all the values in an infinite (or even -FLT_MAX to FLT_MAX) range into 32 binary digits... (quarts and pint pots)

Share this post


Link to post
Share on other sites
Hey,

Thanks for the link. I''ll give that a quick read in a few minutes (haha, that thing is pretty long. I think it will take me a bit to get through it).

I guess I will have to find another solution to what I want to do.

All I wanted to do was increment 200 in increments of 0.1 each time... didn''t think this would be terribly hard to do. Damn these infernal contraptions.

Shawn

Share this post


Link to post
Share on other sites
quote:
Original post by Spanky
Hey,

Thanks for the link. I''ll give that a quick read in a few minutes (haha, that thing is pretty long. I think it will take me a bit to get through it).

I guess I will have to find another solution to what I want to do.

All I wanted to do was increment 200 in increments of 0.1 each time... didn''t think this would be terribly hard to do. Damn these infernal contraptions.

Shawn


Well then why are you multiplying 0.1 by 1.0 first and then adding it to origValue?
If you just put delta + origValue that would solve half your problem.
I think even:
delta = 0.1f * 1 would be more accurate than 0.1f * 1.0f

Share this post


Link to post
Share on other sites
Well, I wasn''t technically accurate in what I said. I do want to increment by 0.1 each time but I want to specify how many times (pixels the mouse has moved). I want to try using a loop instead and see if that would do anything to improve the accuracy.

Thanks
Shawn

Share this post


Link to post
Share on other sites
quote:
Original post by Spanky
Well, I wasn''t technically accurate in what I said. I do want to increment by 0.1 each time but I want to specify how many times (pixels the mouse has moved). I want to try using a loop instead and see if that would do anything to improve the accuracy.

Thanks
Shawn


Since all the variables in your problem are essentially measured as integer values (number of pixels travelled, mouse input value, milliseconds of time), then the most accurate thing to do for that case is simply move everything to integer.

That should only really require a few integer multiplies more to account for things such as converting "pixels moved per milisecond" to "pixels moved per second".

The point of the replies to this thread are more that floating point has its place when you need a much greater range than say 2^32. But can''t represent *everything*. If you need every value in a range, use fixed point or similar.

Share this post


Link to post
Share on other sites