Archived

This topic is now archived and is closed to further replies.

double Presicion

This topic is 5736 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

What is wrong here- Multiplying with type double variables (less than 1 and greater than 10^10) is effecting the precision. The deviations in astrnomical numbers are very noticable over time, and this seems to be the problem. How can I fix this? Thanks double A = 1; double B = 0.1; double C; C = B; // C = 1.000000000000000e-001 C = 1 * B; // C = 1.000000014901161e-001 C = 1 * 0.1; // C = 1.000000000000000e-001 C = A * B; // C = 1.000000014901161e-001 C = pow( 10, 10 ); // C = 1.000000000000000e+010 C = pow( 10, 11 ); // C = 9.999999795200000e+010 C = pow( 10, 20 ); // C = 1.000000020040877e+020 _stprintf( Text, "C = %.015e", C );

Share this post


Link to post
Share on other sites

  
#include <stdio.h>

#include <math.h>

int main() {
double A = 1;
double B = 0.1;
double C;
C = B; printf("%.015e\n", C);
C = 1 * B; printf("%.015e\n", C);
C = 1 * 0.1; printf("%.015e\n", C);
C = A * B; printf("%.015e\n", C);
C = pow(10, 10); printf("%.015e\n", C);
C = pow(10, 11); printf("%.015e\n", C);
C = pow(10, 20); printf("%.015e\n", C);
getchar();
}



Outputs these numbers under gcc and borland:

1.000000000000000e-01
1.000000000000000e-01
1.000000000000000e-01
1.000000000000000e-01
1.000000000000000e+10
1.000000000000000e+11
1.000000000000000e+20

...I don''t see any problems here.

Share this post


Link to post
Share on other sites
-edit- Note: This is off topic, its a retort.

quote:
Original post by siaspete
doubles are more precise than floats.

I can't speak for all platforms, but a float is 4 bytes on Win32, and a double is 8 bytes.


Helpful links:
How To Ask Questions The Smart Way | Google can help with your question | Search MSDN for help with standard C or Windows functions




doubles are more precise but the range is smaller than a float.

the reason why a float is called a float is because the decimal point floats around so it makes the range bigger.

besides, on most modern computers (atleast with Sparc, intel and amd, not sure about powerpc chips though, but probably the same if not better) floats are faster.

well anyway, so when dealing with really screwey large or small numbers, go with a float. Best way is to make your own format though, so you get a more precise float. You do not want to be using doubles for really extreme maths though. Maybe if it was a 32byte double, but again, why use a 32byte double when you can use an 8byte or even your normal 4byte float?



Beer - the love catalyst
good ol' homepage

[edited by - Dredge-Master on March 26, 2002 5:50:23 PM]

Share this post


Link to post
Share on other sites
k, i don''t know what kind of smack you guys are on....

from the MSDN for VS.NET.

C Language Reference

Floating-point variables are represented by a mantissa, which contains the value of the number, and an exponent, which contains the order of magnitude of the number.

The following table shows the number of bits allocated to the mantissa and the exponent for each floating-point type. The most significant bit of any float or double is always the sign bit. If it is 1, the number is considered negative; otherwise, it is considered a positive number.

Lengths of Exponents and Mantissas

Type Exponent length Mantissa length
float 8 bits 23 bits
double 11 bits 52 bits

Range of Floating-Point Types

Type Minimum value Maximum value
float 1.175494351 E – 38 3.402823466 E + 38
double 2.2250738585072014 E – 308 1.7976931348623158 E + 308

If precision is less of a concern than storage, consider using type float for floating-point variables. Conversely, if precision is the most important criterion, use type double.

Floating-point variables can be promoted to a type of greater significance (from type float to type double). Promotion often occurs when you perform arithmetic on floating-point variables. This arithmetic is always done in as high a degree of precision as the variable with the highest degree of precision. For example, consider the following type declarations:


therefore "double" IS MORE PERCISE than "float". EOD.

To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.

Share this post


Link to post
Share on other sites
Double is a floating-point data type.

Doubles are floats with larger mantissas and larger exponents. The only thing that doesn't get more bits is the sign - and if you need more than one for that I don't know what kind of crazy math you're doing.

In other words: There is nothing you can do with a float that you can't do with a double. Doubles are more accurate.

The internal FPU often has higher precision than doubles, but aside from that, doubles are about as good as it gets, unless your compiler supports a long double type (and actually makes it something better than a synonym for double).



[edited by - TerranFury on March 26, 2002 6:24:27 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by Dredge-Master
doubles are more precise but the range is smaller than a float.

wtf? Doubles dedicate 3 more bits to the range of the number than a float does. Are you thinking of fixed-point numbers or something?

Share this post


Link to post
Share on other sites
quote:
Original post by TerranFury
In other words: There is nothing you can do with a float that you can''t do with a double.



You can''t write fast games: if we replaced all our float code with double code the game would probably drop from 60 to well under 30 fps.

Floats are accurate to better than 1 part in a million, i.e. errors are less than 1mm per km, I can''t think of any gaming appilcation for better precision than this. For comparison in science experiements are usually accurate to no more than 1 part in a thousand, and are often a lot less accurate.

Share this post


Link to post
Share on other sites
quote:
Original post by johnb
[quote]Original post by TerranFury
In other words: There is nothing you can do with a float that you can''t do with a double.



You can''t write fast games: if we replaced all our float code with double code the game would probably drop from 60 to well under 30 fps.

Floats are accurate to better than 1 part in a million, i.e. errors are less than 1mm per km, I can''t think of any gaming appilcation for better precision than this. For comparison in science experiements are usually accurate to no more than 1 part in a thousand, and are often a lot less accurate.

For single calculations, regular floats would be fine if they are precise enough, however the original poster complained of losing accuracy over time. as in, you lose a little accuracy per calc over 100000 calcs, it adds up. so, use a double!

Share this post


Link to post
Share on other sites
Brief overview of IEEE 754 floating point data types:

S is sign (1 bit)
E is exponent (8/11 bits)
M is mantissa (23/52 bits)

- 32 bits (aka float): ((-1)^S)*(2^(E-127))*1,M

- 64 bits (aka double): ((-1)^S)*(2^(E-1023))*1,M

...



[edited by - bloodscourge on March 28, 2002 12:34:51 PM]

Share this post


Link to post
Share on other sites
You have two main options. One is to adjust the value periodically. Whether that is actually an option depends upon how accurately you can predict the error. With our calendar we can say we are going to be off by about a day every four years. We also know that if you make an adjustment every four years by a whole day that after 400 years we would be off by 3 days. We know if we make adjustments for that error then after several thousand years we will be off by day and need yet another adjustment.

The other option is to re-establish the number. Generally you iterate a calculation for performance. It takes a lot of work to calculate what the value should be, but relatively easy to say how it should have changed from the previous value. So you update the value for awhile and then recalculate it. If the error grows too fast then you have to break the calculation down. As an example perhaps you are using some ratio and error is coming mainly from the division. So instead you carry and update the numerator and denominator. The same error still occurs each time you do the division, but you are not compounding the error.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Floats and doubles have a fixed number of significant digits, and round anything past them.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
If presicion is more important than speed (and I assume that it is for you) then look into the GNU Multiple Precision Library: http://swox.com/gmp/
It should offer the presicion you want

Karg

Share this post


Link to post
Share on other sites