double precision floating point arithmatic errors

Started by
4 comments, last by RegularKid 15 years, 9 months ago
Hi, I'm aware of the fact that float point math will always introduce round-off error, however I am running into this problem: EXAMPLE: In C++ / C#:

double x, y, x1, y1;

x = 209.4;
y = 148.8;
x1 = 203.0;
y1 = 145.0;

double ddx = x - x1;
double ddy = y - y1;

RESULT ddx: 6.4000000000000057 ddy: 3.8000000000000114 In Flash ActionScipt 2:

var x, y, x1, y1;

x = 209.4;
y = 148.8;
x1 = 203.0;
y1 = 145.0;

var ddx = x - x1;
var ddy = y - y1;

RESULT ddx: 6.39999999999992 ddy: 3.80000000000024 After researching Flash / Actionscript "var" stores numerical values as doubles ( 8 bytes ) just like doubles are stored in C++ / C# ( 8 bytes ). Why would there be a difference between the results of ddx and ddy? Are there different implementations of double floating point math? If so, Is there a way I can mimic the Flash / Actionscript version in C++ / C#? Any help would be great! Thanks!
Advertisement
Hi,

I was really surprised when i read your post because although I knew of rounding errors when using floating point values I did not realise they could be introduced through initialising a value like double a = 0.2; for example.

So i did a test in VS 9 C++ and got the same results as yourself and have spent the better part of an hour looking for rounding functions that are in C++ libraries/files such as Math.h and iostream etc.

closest thing i can find on MSDN is numeric_limits in conjunction with _controlfp_s. _controlfp_s should in theory let you dictate how rounding occurs throughout your code from what i can gather from MSDN and if you knew that you could implement the same rounding in flash/actionscript to achieve the same result. unfortunately for whatever reason _controlfp_s doesn't seem to work lol, it compiles and runs but does not seem to change the rounding as defined with numeric_limits<double>::round_style and because round_style is a const you can't set it through numeric_limits<double>::round_style.

There is probably a dead easy to use rounding system in .Net so you might wanna check that out.

So if anyone can figure this out I too am interested to know as it has stumped me lol.
"I have more fingers in more pies than a leper at a bakery!"
Quote:Original post by RegularKid
Why would there be a difference between the results of ddx and ddy? Are there different implementations of double floating point math? If so, Is there a way I can mimic the Flash / Actionscript version in C++ / C#?

The FPU can be set to different precision levels. It's possible that Flash is doing that, to improve rasterization performance (it has to do a lot of math behind the scenes). With MSVC, you can use the _controlfp function. I gotta say, though, that trying to exactly mimic Flash's floating point behavior is not going to be reliable; its compiler may well reorder floating point operations in a way that the C++ compiler does not, introducing more (and harder to find) differences. Focus on making the differences not make a difference.
Quote:Original post by fanaticlaticI was really surprised when i read your post because although I knew of rounding errors when using floating point values I did not realise they could be introduced through initialising a value like double a = 0.2; for example.


It's the same as when we try to write 1/3 in decimal. At one point one has to stop writing 3's, and the value you end up with isn't quite 1/3.

Quote:Original post by RegularKid
After researching Flash / Actionscript "var" stores numerical values as doubles ( 8 bytes ) just like doubles are stored in C++ / C# ( 8 bytes ). Why would there be a difference between the results of ddx and ddy? Are there different implementations of double floating point math? If so, Is there a way I can mimic the Flash / Actionscript version in C++ / C#?

Why? Correctly written program could stomash different direction of rounding errors, so why even bother. At most you can correctly round in both languages. Or you can use some high precision library, and accept slowdown by factor 10.

Re other posters.
Hi!

I'll try out the _controlfp function and see if I get any good results from it.

Raghar, the reason I need to mimic the Flash floating point error is that I have a program I'm porting to C# from Flash. The minute differences in rounding errors can lead to inconsistencies in the two versions. And unfortunately, I'm unable to alter the Flash code base. So the only option is to essentially attempt to mimic the Flash version floating point arithmetic so it's identical.

This topic is closed to new replies.

Advertisement