Archived

This topic is now archived and is closed to further replies.

CProgrammer

doubles and floats

Recommended Posts

Nervo    344
I believe float offers at least 6 significant digits whereas double is supposed to allow at least 10 significant digits, although at least 15 significant digits are more common across most implementations.

Share this post


Link to post
Share on other sites
smart_idiot    1298
Doubles are alot more accurate than floats. And don''t think because they are twice the size means it is only twice as accurate. A long is generally 4 times the size of a char, but it can hold 16843009 times the possible values.

Share this post


Link to post
Share on other sites
deepdene    292
From a games programmer point of view, most games deal with mostly floats due to the fact that floats are supported by the graphics hardware natively.

The actual size between float, double and long double actually change between compilers, so it's not a static thing.

In regards to the precision of a double and float the c++ standard states(not exact wording, simplified it for you guys but if ya interested look up section 3.9.1.8 of the standard):
float - Type float is the smallest floating type.

double - Type double is a floating type that is larger than or equal to type float, but shorter than or equal to the size of type long double.

long double - Type long double is a floating type that is larger than or equal to type double.

Here is some more information about data types and Visual C++. The C++ standard information is above while the actual sizes used in Visual C++ are below.

[edited by - deepdene on January 6, 2004 8:25:26 AM]

Share this post


Link to post
Share on other sites
CWizard    127
quote:
Original post by deepdene
The actual size between float, double and long double actually change between compilers, so it''s not a static thing.
I don''t think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.

Share this post


Link to post
Share on other sites
emilk    216
quote:
Original post by CWizard
quote:
Original post by deepdene
The actual size between float, double and long double actually change between compilers, so it''s not a static thing.
I don''t think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.


But the C++ Standard doesn''t force you to use the IEEE Standard in your implementation, does it now? (it doesn''t)

Share this post


Link to post
Share on other sites
Nervo    344
Like I was inferring above, 6 significant digits are to be the minimum for a float. 10 significant digits the minimum for a double. It does vary across implementations. In the case of a float it would be (where x is the number of digits):

6 <= x < 10

for double:

10 <= x

Share this post


Link to post
Share on other sites
flangazor    516
quote:
Original post by CWizard
quote:
Original post by deepdene
The actual size between float, double and long double actually change between compilers, so it''s not a static thing.
I don''t think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.
Only so far as saying sizeof(short)<=sizeof(int)<=sizeof(long)

Share this post


Link to post
Share on other sites
deepdene    292
quote:
CWizard posted the following:
I don't think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.



Yeah as someone mentioned earlier c++ doesn't neccessarily follow the IEEE standard.

This is the EXACT text from the c++ standard -- can't get anymore official then this:
3.9.1.8 - There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template numeric_limits (_lib.support.limits_) shall specify the maximum and minimum values of each arithmetic type for an implementation.


[edited by - deepdene on January 6, 2004 11:41:14 AM]

Share this post


Link to post
Share on other sites
Doc    586
quote:
Original post by deepdene
This is the EXACT text from the c++ standard -- can''t get anymore official then this:
3.9.1.8 - There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.


I was not aware that long double was actually standard. Learn something new every month.

Share this post


Link to post
Share on other sites
reaction    100
I had a problem the other day...

I need to restrict a float to 2 decimal places of precision. I didn't know how to do it (it is not possible with masking), so I multiplied by 1000 and hoped for the best (so far).

Is there a standard method to solve this problem?

BTW...


[42702.658].[DoInit]........................sizeof(float) = 4
[42702.658].[DoInit]........................sizeof(double) = 8
[42702.658].[DoInit]........................sizeof(int) = 4
[42702.658].[DoInit]........................sizeof(long) = 4
[42702.658].[DoInit]........................sizeof(DWORD) = 4


... doubles use 8 bytes.

R





EDIT: pardon me... for a VC++6 Win32 application.

[edited by - reaction on January 7, 2004 9:19:25 AM]

Share this post


Link to post
Share on other sites