Back to General and Gameplay Programming

Printing Floating-Point Values

General and Gameplay Programming Programming

Started by L. Spiro February 28, 2010 08:58 PM

9 comments, last by L. Spiro 14 years, 1 month ago

L. Spiro

25,818

Author

February 28, 2010 08:58 PM

I am manually printing floating-point numbers. Their integral and fractional components are too large to hold in any compiler-supported format. For an 80-bit floating-point number, the mantissa itself is already 64 bits. I can hold that in a 64-bit integer, but as soon as the exponent moves in either direction at all then I lose bits high or low. Printing a multiplication would be easy except that the multiplicand itself can be too high to store in a 64-bit integer. For example, it is easy to print 2269216741329525385 × 2^52 = 2269216741329525385 × 4503599627370496. I can perform the math directly on the output string itself, without needing to store large results anywhere etc. This works because I can store both numbers in a 64-bit type. But what if I am using an 80-bit float (or 128-bit float) with an exponent 1034 or so? 2269216741329525385 × 2^1034 = NaN. Can not store 2^1034 in any compiler-defined type. This problem applies also to division, which would be easy if both numbers could always be represented with a compiler-defined type. Is my only solution to develop a large-integer class and work with that, or is there a super-clever way I can print these large numbers after multiplication or division? Since I am only printing the numbers, I can work with methods that apply results directly to the output string. I prefer something like this because using a large-number class should result in poor performance. Thank you.

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Timptation

132

February 28, 2010 10:11 PM

Obviously 2^1034 is not going to fit in any existing types (and probably not even fit visually on your screen). If you really need to determine values of that precision (I can't imagine why, other than for the challenge of doing it), then you're on the right track, using string values. I would think it would be much harder to develop your own 2048-bit integer type... Good luck to ya :-)

iMalc

2,466

March 01, 2010 12:52 AM

Quote:Original post by YogurtEmperor
Is my only solution to develop a large-integer class and work with that, or is there a super-clever way I can print these large numbers after multiplication or division?

That's not your only solution.
You could use ones other have made, such as the simple varbigint, bigint, or megafloat classes I've already written. See the Useful Classes section of the link in my sig. I appologise in advance for how crappy my site looks.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

frob

46,221

March 01, 2010 03:40 AM

I'm failing to see the issue here. Assuming C++, set your output stream's format and precision to the desired type and size.

You can set your precision as high as you want, assuming don't go above FLT_DIG or DBL_DIG or LDBL_DIG depending on your type. Printing digits beyond the floating point's precision is just garbage.

If you want to display all 19 digits of 2269216741329525385, then simply set your output stream's format to fixed and precision to 19 and output the value.

(As a side note, you have to go through special gyrations on Visual Studio to get a long double, it silently treats them as a 64-bit double. Hopefully you knew that already if you are looking for 80-bit floats.)

L. Spiro

25,818

Author

March 03, 2010 02:31 AM

Thank you for all the replies.

Timptation: Seems I have to make a large integer class after all, but some parts can be done directly on the output buffer. Almost done with my own large-integer template class.

iMalc: The link is handy, but for this project I need to use entirely custom coding. I will keep your link around for other projects in the future.

frob: The issue is that I am printing the doubles manually, not using built-in functions at all (not even linking to the standard C libraries).
And indeed I do know about Microsoft Visual Studio®’s long double secrets; that makes some things an ass in the pain sometimes. I have to use tbyte, exposed only through ASM, and a 10-byte buffer to hold/manipulate them.

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

momotte

171

March 03, 2010 03:37 AM

I'm not sure I understand what you are trying to achieve.

you want to convert an 80-bit floating point value into a displayable string, right ?
if so, why do you need to convert it to an intermediate integer representation? can't you just recompose the mantissa into base-10 digits, bit by bit, and then place the decimal point wherever it needs to be, padding with zeroes as necessary, based on the exponent value ? it's pretty fast, you just need a couple shifts and an integer mul per output decimal digit in the inner loop.

L. Spiro

25,818

Author

March 03, 2010 04:48 AM

I had hoped it would be that simple too, but printing the fractional part is not as easy and 128-bit floating-point values have a mantissa of 112 bits, so I can not even store them in the largest compiler-defined type.
Besides, given the format of the data, it is impossible to tell how many zeros to use as padding. There probably is some trick to the integral part (notwithstanding 112-bit mantissas), but there definitely is not a trick to the fractional part.

As I mentioned before, even if I wanted to obtain the result of division digit-by-digit directly on the output buffer, I would still need operands capable of holding numbers as high as 2^16385 (the largest dividend in 128-bit floats).

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

momotte

171

March 03, 2010 08:11 AM

I currently have working code that does just this (not on 128 bit floats, but the algorithm can be generalized)

Quote:Besides, given the format of the data, it is impossible to tell how many zeros to use as padding. There probably is some trick to the integral part (notwithstanding 112-bit mantissas), but there definitely is not a trick to the fractional part.

with the base2 exponent, you can compute the base10 exponent. all this can be done into 32 bits types. the base10 exponent, will give you the decimal point's position. this is all you need to be able to emit the padding zeroes and the decimal point.

for the fractional part:
if the exponent is negative, you emit "0." + (abs(base10Exp) - 1) * "0".
otherwise, if the exponent is positive, you start decomposing the mantissa, and emitting the decimal digits (more about this later).
once you emitted 'base10Exp' digits, you drop the decimal point and resume emitting digits. if you have consumed your whole mantissa, and still have not reached the decimal point, you simly emit 'base10Exp - number of already printed digits' zeroes.

(by the way, you might not want to actually emit zeroes for padding. even when the mantissa will be entirely consumed, depending on the fp value, you can keep emitting nonzero digits for a while, to print the exact value this mantissa represents (a 3 bit mantissa with the value 0b001 will give a real value of 0.125, way above the single decimal digit that would have theoretically been emitted if we stopped emitting digits when the mantissa had been fully consumed. but that's a detail. most implementations just round the last digit and don't bother with the sub-digits that exceed the mantissa's precision...))

anyway...
about decomposing the mantissa...
if we forget about the whole integer storage story for the moment.
extracting digits from a mantissa will basically go like this:
(assuming a zero exponent, we'll generalize to nonzero exponents later)

N = 1 << mantissaBitCount
m = mantissa | (1 << (mantissaBitCount + 1)) // add the silent '1' to the mantissa. this will require special handling for denormals

digit0 = ((m / N) * 1) % 10
digit1 = ((m / N) * 10) % 10;
digit2 = ((m / N) * 100) % 10;
digit3 = ((m / N) * 1000) % 10;
...
digitn = ((m / N) * 10^n) % 10;

removing the modulos and divs:
mb = mantissaBitCount

digit0 = (m >> mb);
m = 10 * (m - (digit0 << mb));
digit1 = (m >> mb);
m = 10 * (m - (digit1 << mb));
digit2 = (m >> mb);
m = 10 * (m - (digit2 << mb));
...
digitn = (m >> mb);
m = 10 * (m - (digitn << mb));

this method will give you the real sub-digits mentioned earlier, even when you've consumed all the mantissa bits.
there are many details and tweaks that can be added to maximise precision. I don't have enough time right now to dive into the implementation details, but the rough algo should be enough.

now. the part that will be problematic for your specific case, is just that you will have to work on two u64 to represent your 112-bit mantissa:

m = 10 * (m - (digit2 << mb));

the shifts are trivial, and you just need a 128-bit integer multiply (really, a "128-bit times 10" operator), and a 128-bit integer subtraction.

with these two tools you should be able to pull it off without too much trouble.

if you're concerned about speed, this is a pretty fast method, and should stay pretty fast even if you roll a custom (128-bit,10) mul, and 128 bit subtraction:

timings (in cycles) of sprintf() and this method:

sprintf		custom		format	value2517.3916	135.63612	%.f	42.123f2669.46143	157.22189	%.3f	42.123f2696.28394	181.10466	%.6f	42.123f2842.25659	214.67934	%.10f	42.123f2662.26904	132.54713	%.e	42.123f2722.87842	153.83556	%.3e	42.123f2827.36914	179.39142	%.6e	42.123f2846.64575	215.57584	%.10e	42.123f3239.98413	160.91776	%.3e	4.2123e-20f2521.19873	151.42184	%.3e	4.2123e-3f2472.59106	151.64955	%.3e	4.2123e-2f2025.72388	152.0253	%.3e	4.2123e-1f2701.29175	127.41381	%.3e	4.2123e+0f2695.72241	152.35506	%.3e	4.2123e+1f2716.10889	154.34775	%.3e	4.2123e+2f3290.89038	160.06815	%.3e	4.2123e+20f

I don't know if anybody else uses something like this, but it works pretty well for me.

cache_hit

614

March 03, 2010 08:16 AM

Quote:Original post by YogurtEmperor
iMalc: The link is handy, but for this project I need to use entirely custom coding. I will keep your link around for other projects in the future.

Why would a project require that you use entirely custom coding, regardless of the license granted with any existing piece of code and regardless of how suitable a piece of code is for your project? That doesn't make a lot of sense.

swiftcoder

18,997

March 03, 2010 09:05 AM

Quote:Original post by cache_hit
Why would a project require that you use entirely custom coding, regardless of the license granted with any existing piece of code and regardless of how suitable a piece of code is for your project? That doesn't make a lot of sense.

Generally, I would assume a homework project.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Printing Floating-Point Values

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Printing Floating-Point Values

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines