Archived

This topic is now archived and is closed to further replies.

High precision floats

This topic is 5422 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

How would I go about making my own floating point type variable, that has a super high accuracy? If it''s really hard though, do any of you know of any free libraries or something that I can use for some higher precision floats? I need it for a fractal drawer, because you can''t zoom in very well with a normal float.

Share on other sites
Just as an interim solution have you tried doubles? I''m guessing prolly so.

-=[ Megahertz ]=-

Share on other sites

float-------32 bit -> 1E-37 to 1E+37 with six digits of precision
double------64 bit -> 1E-37 to 1E+37 with ten digits of precision
long double-80 bit -> 1E-37 to 1E+37 with ten digits of precision

Share on other sites
quote:
Original post by Anonymous Poster

float-------32 bit -> 1E-37 to 1E+37 with six digits of precision
double------64 bit -> 1E-37 to 1E+37 with ten digits of precision
long double-80 bit -> 1E-37 to 1E+37 with ten digits of precision

Then what''s the difference between long double and normal doubles?

Share on other sites
long double and double are not always different. In VC++ 6.0 long double has 64 bits while double has 64 bits both with 10 digits of accuracy, but on GCC long double has 80 bits (IIRC 14 digits of accuracy) while double has 64 bits.

Instead of float, try double or long double.

Share on other sites
long double is a java thing, you will not find it in c++ (its not there or same as double)...

edit: ohh gcc has it? wow...

T2k

[edited by - T2k on January 11, 2004 2:27:32 PM]

Share on other sites
Actually, long double isn''t a java-thing at all. It exists in C/C++, but in most standard compilers it''s the same as a double.
GCC implements it with 80 bits, and i think ICC does aswell?
Anyway, creating a floatingpoint-class with very high precision seems kind of overkill, and it''s not the easiest thing. Ever thought of using fixed point instead? For example 64:64. That would give you HUGE precision, but would ofcourse be kind of slow.

--
MFC is sorta like the swedish police... It''''s full of crap, and nothing can communicate with anything else.

Share on other sites
quote:
Original post by tok_junior
Actually, long double isn''t a java-thing at all. It exists in C/C++, but in most standard compilers it''s the same as a double.
GCC implements it with 80 bits, and i think ICC does aswell?
Anyway, creating a floatingpoint-class with very high precision seems kind of overkill, and it''s not the easiest thing. Ever thought of using fixed point instead? For example 64:64. That would give you HUGE precision, but would ofcourse be kind of slow.

--
MFC is sorta like the swedish police... It''''s full of crap, and nothing can communicate with anything else.

I don''t know what you mean by fixed point, what is it and how do I use it?

Share on other sites
Fixed-point is the same as integral.

Share on other sites
quote:
Original post by Anonymous Poster

float-------32 bit -> 1E-37 to 1E+37 with six digits of precision
double------64 bit -> 1E-37 to 1E+37 with ten digits of precision
long double-80 bit -> 1E-37 to 1E+37 with ten digits of precision

These ranges are off. The larger floating point types store bigger ranges as well as providing more precision.

float = 1 bit sign, 23 bits mantissa, 8 bit exponent
double = 1 bit sign, 52 bits mantissa, 11 bit exponent
long double = 1 bit sign, 63 bits mantissa, 16 bit exponent - I think. Not sure on the last one.

The availability of "long double" is kind of a hardware thing, really. The standards for how the numbers behave (the allocation of bits to mantissa/exponent, etc) is specified by the relevant IEEE standard - #754.

However:
- Some C/C++ compilers will interpret "long double" as "double", even though the hardware is capable (and almost all desktop PC hardware is, apparently)
- In the old days of K&R C, operations between two floats would always use double internally, and I think operations between two doubles would similarly use long double, but I could be wrong on that one. Now the type coercion rules are simplified; the shorter FP value is promoted to the type of the longer one, but two floats still mean the work is done in float values. The result is that errors can accumulate in the last bit. (this is from what I remember about the long PDF referenced at the end of this post.)

Numerical stability is not a simple bit of study, BTW; some of the rules of thumb like "you only need a couple more bits as ''guard'' on your calculation" fail catastrophically for some formulas. It''s not difficult to construct things where using double internally, when the initial values are floats, really is needed to get the right result.

Interesting references on the subject:
http://cch.loria.fr/documentation/IEEE754/
www.cs.nyu.edu/cs/faculty/overton/ book/docs/KahanTalk.pdf
www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf (80 pages, but I read it all and so should you.)

Apparently this Kahan guy is authoritative on the subject. :s

1. 1
Rutin
32
2. 2
3. 3
4. 4
5. 5

• 13
• 9
• 9
• 9
• 14
• Forum Statistics

• Total Topics
633319
• Total Posts
3011344
• Who's Online (See full list)

There are no registered users currently online

×