Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Samith

High precision floats

This topic is 5249 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

How would I go about making my own floating point type variable, that has a super high accuracy? If it''s really hard though, do any of you know of any free libraries or something that I can use for some higher precision floats? I need it for a fractal drawer, because you can''t zoom in very well with a normal float.

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster

float-------32 bit -> 1E-37 to 1E+37 with six digits of precision
double------64 bit -> 1E-37 to 1E+37 with ten digits of precision
long double-80 bit -> 1E-37 to 1E+37 with ten digits of precision

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster

float-------32 bit -> 1E-37 to 1E+37 with six digits of precision
double------64 bit -> 1E-37 to 1E+37 with ten digits of precision
long double-80 bit -> 1E-37 to 1E+37 with ten digits of precision


Then what''s the difference between long double and normal doubles?

Share this post


Link to post
Share on other sites
long double and double are not always different. In VC++ 6.0 long double has 64 bits while double has 64 bits both with 10 digits of accuracy, but on GCC long double has 80 bits (IIRC 14 digits of accuracy) while double has 64 bits.

Instead of float, try double or long double.


Colin Jeanne | Invader''s Realm

Share this post


Link to post
Share on other sites
long double is a java thing, you will not find it in c++ (its not there or same as double)...


edit: ohh gcc has it? wow...


T2k

[edited by - T2k on January 11, 2004 2:27:32 PM]

Share this post


Link to post
Share on other sites
Actually, long double isn''t a java-thing at all. It exists in C/C++, but in most standard compilers it''s the same as a double.
GCC implements it with 80 bits, and i think ICC does aswell?
Anyway, creating a floatingpoint-class with very high precision seems kind of overkill, and it''s not the easiest thing. Ever thought of using fixed point instead? For example 64:64. That would give you HUGE precision, but would ofcourse be kind of slow.


--
MFC is sorta like the swedish police... It''''s full of crap, and nothing can communicate with anything else.

Share this post


Link to post
Share on other sites
quote:
Original post by tok_junior
Actually, long double isn''t a java-thing at all. It exists in C/C++, but in most standard compilers it''s the same as a double.
GCC implements it with 80 bits, and i think ICC does aswell?
Anyway, creating a floatingpoint-class with very high precision seems kind of overkill, and it''s not the easiest thing. Ever thought of using fixed point instead? For example 64:64. That would give you HUGE precision, but would ofcourse be kind of slow.


--
MFC is sorta like the swedish police... It''''s full of crap, and nothing can communicate with anything else.


I don''t know what you mean by fixed point, what is it and how do I use it?

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster

float-------32 bit -> 1E-37 to 1E+37 with six digits of precision
double------64 bit -> 1E-37 to 1E+37 with ten digits of precision
long double-80 bit -> 1E-37 to 1E+37 with ten digits of precision


These ranges are off. The larger floating point types store bigger ranges as well as providing more precision.

float = 1 bit sign, 23 bits mantissa, 8 bit exponent
double = 1 bit sign, 52 bits mantissa, 11 bit exponent
long double = 1 bit sign, 63 bits mantissa, 16 bit exponent - I think. Not sure on the last one.

The availability of "long double" is kind of a hardware thing, really. The standards for how the numbers behave (the allocation of bits to mantissa/exponent, etc) is specified by the relevant IEEE standard - #754.

However:
- Java doesn''t provide access to a "long double" type.
- Some C/C++ compilers will interpret "long double" as "double", even though the hardware is capable (and almost all desktop PC hardware is, apparently)
- In the old days of K&R C, operations between two floats would always use double internally, and I think operations between two doubles would similarly use long double, but I could be wrong on that one. Now the type coercion rules are simplified; the shorter FP value is promoted to the type of the longer one, but two floats still mean the work is done in float values. The result is that errors can accumulate in the last bit. (this is from what I remember about the long PDF referenced at the end of this post.)

Numerical stability is not a simple bit of study, BTW; some of the rules of thumb like "you only need a couple more bits as ''guard'' on your calculation" fail catastrophically for some formulas. It''s not difficult to construct things where using double internally, when the initial values are floats, really is needed to get the right result.

Interesting references on the subject:
http://cch.loria.fr/documentation/IEEE754/
www.cs.nyu.edu/cs/faculty/overton/ book/docs/KahanTalk.pdf
www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf (80 pages, but I read it all and so should you.)

Apparently this Kahan guy is authoritative on the subject. :s

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!