Jump to content
  • Advertisement
Sign in to follow this  
RegularKid

double precision floating point arithmatic errors

This topic is 3738 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm aware of the fact that float point math will always introduce round-off error, however I am running into this problem: EXAMPLE: In C++ / C#:
double x, y, x1, y1;

x = 209.4;
y = 148.8;
x1 = 203.0;
y1 = 145.0;

double ddx = x - x1;
double ddy = y - y1;

RESULT ddx: 6.4000000000000057 ddy: 3.8000000000000114 In Flash ActionScipt 2:
var x, y, x1, y1;

x = 209.4;
y = 148.8;
x1 = 203.0;
y1 = 145.0;

var ddx = x - x1;
var ddy = y - y1;

RESULT ddx: 6.39999999999992 ddy: 3.80000000000024 After researching Flash / Actionscript "var" stores numerical values as doubles ( 8 bytes ) just like doubles are stored in C++ / C# ( 8 bytes ). Why would there be a difference between the results of ddx and ddy? Are there different implementations of double floating point math? If so, Is there a way I can mimic the Flash / Actionscript version in C++ / C#? Any help would be great! Thanks!

Share this post


Link to post
Share on other sites
Advertisement
Hi,

I was really surprised when i read your post because although I knew of rounding errors when using floating point values I did not realise they could be introduced through initialising a value like double a = 0.2; for example.

So i did a test in VS 9 C++ and got the same results as yourself and have spent the better part of an hour looking for rounding functions that are in C++ libraries/files such as Math.h and iostream etc.

closest thing i can find on MSDN is numeric_limits in conjunction with _controlfp_s. _controlfp_s should in theory let you dictate how rounding occurs throughout your code from what i can gather from MSDN and if you knew that you could implement the same rounding in flash/actionscript to achieve the same result. unfortunately for whatever reason _controlfp_s doesn't seem to work lol, it compiles and runs but does not seem to change the rounding as defined with numeric_limits<double>::round_style and because round_style is a const you can't set it through numeric_limits<double>::round_style.

There is probably a dead easy to use rounding system in .Net so you might wanna check that out.

So if anyone can figure this out I too am interested to know as it has stumped me lol.

Share this post


Link to post
Share on other sites
Quote:
Original post by RegularKid
Why would there be a difference between the results of ddx and ddy? Are there different implementations of double floating point math? If so, Is there a way I can mimic the Flash / Actionscript version in C++ / C#?

The FPU can be set to different precision levels. It's possible that Flash is doing that, to improve rasterization performance (it has to do a lot of math behind the scenes). With MSVC, you can use the _controlfp function. I gotta say, though, that trying to exactly mimic Flash's floating point behavior is not going to be reliable; its compiler may well reorder floating point operations in a way that the C++ compiler does not, introducing more (and harder to find) differences. Focus on making the differences not make a difference.

Share this post


Link to post
Share on other sites
Quote:
Original post by fanaticlaticI was really surprised when i read your post because although I knew of rounding errors when using floating point values I did not realise they could be introduced through initialising a value like double a = 0.2; for example.


It's the same as when we try to write 1/3 in decimal. At one point one has to stop writing 3's, and the value you end up with isn't quite 1/3.

Share this post


Link to post
Share on other sites
Quote:
Original post by RegularKid
After researching Flash / Actionscript "var" stores numerical values as doubles ( 8 bytes ) just like doubles are stored in C++ / C# ( 8 bytes ). Why would there be a difference between the results of ddx and ddy? Are there different implementations of double floating point math? If so, Is there a way I can mimic the Flash / Actionscript version in C++ / C#?

Why? Correctly written program could stomash different direction of rounding errors, so why even bother. At most you can correctly round in both languages. Or you can use some high precision library, and accept slowdown by factor 10.

Re other posters.

Share this post


Link to post
Share on other sites
Hi!

I'll try out the _controlfp function and see if I get any good results from it.

Raghar, the reason I need to mimic the Flash floating point error is that I have a program I'm porting to C# from Flash. The minute differences in rounding errors can lead to inconsistencies in the two versions. And unfortunately, I'm unable to alter the Flash code base. So the only option is to essentially attempt to mimic the Flash version floating point arithmetic so it's identical.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!