Archived

This topic is now archived and is closed to further replies.

doubles and floats

This topic is 5091 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I believe float offers at least 6 significant digits whereas double is supposed to allow at least 10 significant digits, although at least 15 significant digits are more common across most implementations.

Share this post


Link to post
Share on other sites
Doubles are alot more accurate than floats. And don''t think because they are twice the size means it is only twice as accurate. A long is generally 4 times the size of a char, but it can hold 16843009 times the possible values.

Share this post


Link to post
Share on other sites
From a games programmer point of view, most games deal with mostly floats due to the fact that floats are supported by the graphics hardware natively.

The actual size between float, double and long double actually change between compilers, so it's not a static thing.

In regards to the precision of a double and float the c++ standard states(not exact wording, simplified it for you guys but if ya interested look up section 3.9.1.8 of the standard):
float - Type float is the smallest floating type.

double - Type double is a floating type that is larger than or equal to type float, but shorter than or equal to the size of type long double.

long double - Type long double is a floating type that is larger than or equal to type double.

Here is some more information about data types and Visual C++. The C++ standard information is above while the actual sizes used in Visual C++ are below.

[edited by - deepdene on January 6, 2004 8:25:26 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by deepdene
The actual size between float, double and long double actually change between compilers, so it''s not a static thing.
I don''t think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.

Share this post


Link to post
Share on other sites
quote:
Original post by CWizard
quote:
Original post by deepdene
The actual size between float, double and long double actually change between compilers, so it''s not a static thing.
I don''t think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.


But the C++ Standard doesn''t force you to use the IEEE Standard in your implementation, does it now? (it doesn''t)

Share this post


Link to post
Share on other sites
Like I was inferring above, 6 significant digits are to be the minimum for a float. 10 significant digits the minimum for a double. It does vary across implementations. In the case of a float it would be (where x is the number of digits):

6 <= x < 10

for double:

10 <= x

Share this post


Link to post
Share on other sites
quote:
Original post by CWizard
quote:
Original post by deepdene
The actual size between float, double and long double actually change between compilers, so it''s not a static thing.
I don''t think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.
Only so far as saying sizeof(short)<=sizeof(int)<=sizeof(long)

Share this post


Link to post
Share on other sites
quote:
CWizard posted the following:
I don't think that is the case. There is an IEEE standard, which I think plots out the exact bits structure.



Yeah as someone mentioned earlier c++ doesn't neccessarily follow the IEEE standard.

This is the EXACT text from the c++ standard -- can't get anymore official then this:
3.9.1.8 - There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template numeric_limits (_lib.support.limits_) shall specify the maximum and minimum values of each arithmetic type for an implementation.


[edited by - deepdene on January 6, 2004 11:41:14 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by deepdene
This is the EXACT text from the c++ standard -- can''t get anymore official then this:
3.9.1.8 - There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.


I was not aware that long double was actually standard. Learn something new every month.

Share this post


Link to post
Share on other sites
I had a problem the other day...

I need to restrict a float to 2 decimal places of precision. I didn't know how to do it (it is not possible with masking), so I multiplied by 1000 and hoped for the best (so far).

Is there a standard method to solve this problem?

BTW...


[42702.658].[DoInit]........................sizeof(float) = 4
[42702.658].[DoInit]........................sizeof(double) = 8
[42702.658].[DoInit]........................sizeof(int) = 4
[42702.658].[DoInit]........................sizeof(long) = 4
[42702.658].[DoInit]........................sizeof(DWORD) = 4


... doubles use 8 bytes.

R





EDIT: pardon me... for a VC++6 Win32 application.

[edited by - reaction on January 7, 2004 9:19:25 AM]

Share this post


Link to post
Share on other sites