Jump to content
  • Advertisement
Sign in to follow this  
EvilWeebl

floating point precision problem

This topic is 2127 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all,
More than just solving this problem I'd love if someone could explain to me the cause for this:

I'm doing a winforms projects and have a picture box with an image. I drag a rectangle over that image and get the bounds of that rectangle. The thing is that I need to store these bounds as UV coordinates and as the picture box Y is at the top and UV's Y(or V) starts at bottom I need to flip them. I do this like so :

return new RectangleF(bounds.X, 1.0f - (bounds.Height + bounds.Y), bounds.Width, bounds.Height);

I should mention that X and Y range from 0.0000 - 1.0000.

Now the problem occurs when I select most if not all the box and the equations looks something like:

return new RectangleF(bounds.X, 1.0f - (0.9886 + 0.0114), bounds.Width, bounds.Height);

Now obviously 1.0f minus this sum should make 0, but instead I'm getting -0.0000000158324838.

Now I've seen that a float has a 7point accuracy but as you can see I'm not using that, so what is the problem?
Any help would be much appreciated.

Share this post


Link to post
Share on other sites
Advertisement
Your operands get converted to double since 0.9886 and 0.0114 are doubles (no "f" at the end). Double also has a limited precision, you just get roughly twice as many digits. Sure, 1.0f - 1.0f = 1.0 - 1.0 = 0, but the intermediate addition you do with bounds.Height and bounds.Y will almost never sum up to perfectly one because of limited precision.

Normally such values should be indistinguishable from zero in almost every situation, is it actually a problem in your case? If everything else fails, just saturate the resulting value to clamp it in the [0, 1] range, but really, you shouldn't need to in general.

Share this post


Link to post
Share on other sites
Thanks for the reply,
Sorry I should have been more specific, the numbers I provided are examples of my bound.Height and bounds.Y values, bounds being another RectangleF so they will be floats and not doubles. Well It's not too important its just causing it to crash when I set some other values that rely on it clamped between 0 and 1 but as you say I can just clamp that myself. I was wondering if using decimal here would be preferable? Or at least casting them to decimal and then the resulting answer back to a float?

Share this post


Link to post
Share on other sites

I was wondering if using decimal here would be preferable?


AFAIK, Decimal is used mainly for financial calculations - i.e. I believe it works best for base 10 numbers in a relatively limited range, and even it will not be infinitely precise of course.

Here is an article with some basic tips on using floating point: http://www.codeproje...int-Programming Edited by laztrezort

Share this post


Link to post
Share on other sites

Now I've seen that a float has a 7point accuracy but as you can see I'm not using that, so what is the problem?

The result is in fact zero... to seven digits of accuracy.

The problem is, just as you appear to be aware of, that you don't have infinite precision. A rule of thumb is roughly seven significant numbers for a float, so you have that quite correct. Any value you store is only valid to that many significant digits. The values 0.9886 and 0.0114, while having 4 significant numbers, actually require infinite precision to store the values exactly. Actual values, when stored in a truncated binary form, may not be those exact number, but as close as a truncated binary representation can provide. the number are, however, accurate to seven digits, as you mentioned.

That means that 0.9886 is not exactly 0.9886, but something very close, and accurate to seven digits. That is, 0.988600xxxx, where xxxx is some residue, but small enough such that, when rounded to seven digits, the value is 0.9886000.

This residue is present in pretty much any number and will accumulate, because it is a part of the actual value being added.

To summarize; seven digits of accuracy does not mean that any value with seven or less significant digits can be perfectly represented, it means that values are not exact but accurate to seven digits.

Share this post


Link to post
Share on other sites
The problem is, that a decimal fraction (base 10) can't always be displayed as a finit sequence of binary numbers (base 2). Therefore the problem is, that you write a decimal fraction which seems ok, but the compiler have to use a binary represention which often only an approximation of the decimal number. As example 0.1 is already an infint sequence when using a binary representation.

When you do some calculations with these approximation and display it (converted back to decimal/base 10) you will see the approximation error.

Share this post


Link to post
Share on other sites

To summarize; seven digits of accuracy does not mean that any value with seven or less significant digits can be perfectly represented, it means that values are not exact but accurate to seven digits.


Ahh I see where I went wrong now, thanks for summing that up for me.

Also laztrezort that link was a good read so thanks for that.

Share this post


Link to post
Share on other sites
To add some details...

In IEEE single precision floating point the closest you can get to the two numbers:

9.886000156402587890625E-1
1.1400000192224979400634765625E-2

When you add them you get:

1.0000000158324837684621

Here's a converter website that is quite useful: http://www.binaryconvert.com

I can go into further details for just why this limitation arises if you want.

Share this post


Link to post
Share on other sites

so maybye use long double?smile.png

[font=courier new,courier,monospace]long double[/font], in C and C++, may provide more precision than [font=courier new,courier,monospace]double[/font], but it may also be the same (so you should check your implementation) (it will not have less precision than [font=courier new,courier,monospace]double[/font], though). However, that doesn't get rid of the fact that you have to deal with floating point error, so the real answer is "understand floating point error and how to work with it," because [font=courier new,courier,monospace]float[/font], [font=courier new,courier,monospace]double[/font], and [font=courier new,courier,monospace]long double[/font] all suffer from floating point error. Just using a different data type doesn't really solve the problem. I'm surprised nobody's linked to this: What Every Computer Scientist Should Know About Floating-Point Arithmetic. It's rather long, but there's a lot of good stuff to learn in there, and you don't have to read it all to learn something useful.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!