Sign in to follow this  

Printing Floating-Point Values

This topic is 2841 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am manually printing floating-point numbers. Their integral and fractional components are too large to hold in any compiler-supported format. For an 80-bit floating-point number, the mantissa itself is already 64 bits. I can hold that in a 64-bit integer, but as soon as the exponent moves in either direction at all then I lose bits high or low. Printing a multiplication would be easy except that the multiplicand itself can be too high to store in a 64-bit integer. For example, it is easy to print 2269216741329525385 × 2^52 = 2269216741329525385 × 4503599627370496. I can perform the math directly on the output string itself, without needing to store large results anywhere etc. This works because I can store both numbers in a 64-bit type. But what if I am using an 80-bit float (or 128-bit float) with an exponent 1034 or so? 2269216741329525385 × 2^1034 = NaN. Can not store 2^1034 in any compiler-defined type. This problem applies also to division, which would be easy if both numbers could always be represented with a compiler-defined type. Is my only solution to develop a large-integer class and work with that, or is there a super-clever way I can print these large numbers after multiplication or division? Since I am only printing the numbers, I can work with methods that apply results directly to the output string. I prefer something like this because using a large-number class should result in poor performance. Thank you.

Share this post


Link to post
Share on other sites
Obviously 2^1034 is not going to fit in any existing types (and probably not even fit visually on your screen). If you really need to determine values of that precision (I can't imagine why, other than for the challenge of doing it), then you're on the right track, using string values. I would think it would be much harder to develop your own 2048-bit integer type... Good luck to ya :-)

Share this post


Link to post
Share on other sites
Quote:
Original post by YogurtEmperor
Is my only solution to develop a large-integer class and work with that, or is there a super-clever way I can print these large numbers after multiplication or division?
That's not your only solution.
You could use ones other have made, such as the simple varbigint, bigint, or megafloat classes I've already written. See the Useful Classes section of the link in my sig. I appologise in advance for how crappy my site looks.

Share this post


Link to post
Share on other sites
I'm failing to see the issue here. Assuming C++, set your output stream's format and precision to the desired type and size.



You can set your precision as high as you want, assuming don't go above FLT_DIG or DBL_DIG or LDBL_DIG depending on your type. Printing digits beyond the floating point's precision is just garbage.

If you want to display all 19 digits of 2269216741329525385, then simply set your output stream's format to fixed and precision to 19 and output the value.




(As a side note, you have to go through special gyrations on Visual Studio to get a long double, it silently treats them as a 64-bit double. Hopefully you knew that already if you are looking for 80-bit floats.)

Share this post


Link to post
Share on other sites
Thank you for all the replies.

Timptation: Seems I have to make a large integer class after all, but some parts can be done directly on the output buffer. Almost done with my own large-integer template class.

iMalc: The link is handy, but for this project I need to use entirely custom coding. I will keep your link around for other projects in the future.

frob: The issue is that I am printing the doubles manually, not using built-in functions at all (not even linking to the standard C libraries).
And indeed I do know about Microsoft Visual Studio®’s long double secrets; that makes some things an ass in the pain sometimes. I have to use tbyte, exposed only through ASM, and a 10-byte buffer to hold/manipulate them.

Share this post


Link to post
Share on other sites
I'm not sure I understand what you are trying to achieve.

you want to convert an 80-bit floating point value into a displayable string, right ?
if so, why do you need to convert it to an intermediate integer representation? can't you just recompose the mantissa into base-10 digits, bit by bit, and then place the decimal point wherever it needs to be, padding with zeroes as necessary, based on the exponent value ? it's pretty fast, you just need a couple shifts and an integer mul per output decimal digit in the inner loop.

Share this post


Link to post
Share on other sites
I had hoped it would be that simple too, but printing the fractional part is not as easy and 128-bit floating-point values have a mantissa of 112 bits, so I can not even store them in the largest compiler-defined type.
Besides, given the format of the data, it is impossible to tell how many zeros to use as padding. There probably is some trick to the integral part (notwithstanding 112-bit mantissas), but there definitely is not a trick to the fractional part.

As I mentioned before, even if I wanted to obtain the result of division digit-by-digit directly on the output buffer, I would still need operands capable of holding numbers as high as 2^16385 (the largest dividend in 128-bit floats).

Share this post


Link to post
Share on other sites
I currently have working code that does just this (not on 128 bit floats, but the algorithm can be generalized)

Quote:
Besides, given the format of the data, it is impossible to tell how many zeros to use as padding. There probably is some trick to the integral part (notwithstanding 112-bit mantissas), but there definitely is not a trick to the fractional part.


with the base2 exponent, you can compute the base10 exponent. all this can be done into 32 bits types. the base10 exponent, will give you the decimal point's position. this is all you need to be able to emit the padding zeroes and the decimal point.

for the fractional part:
if the exponent is negative, you emit "0." + (abs(base10Exp) - 1) * "0".
otherwise, if the exponent is positive, you start decomposing the mantissa, and emitting the decimal digits (more about this later).
once you emitted 'base10Exp' digits, you drop the decimal point and resume emitting digits. if you have consumed your whole mantissa, and still have not reached the decimal point, you simly emit 'base10Exp - number of already printed digits' zeroes.

(by the way, you might not want to actually emit zeroes for padding. even when the mantissa will be entirely consumed, depending on the fp value, you can keep emitting nonzero digits for a while, to print the exact value this mantissa represents (a 3 bit mantissa with the value 0b001 will give a real value of 0.125, way above the single decimal digit that would have theoretically been emitted if we stopped emitting digits when the mantissa had been fully consumed. but that's a detail. most implementations just round the last digit and don't bother with the sub-digits that exceed the mantissa's precision...))

anyway...
about decomposing the mantissa...
if we forget about the whole integer storage story for the moment.
extracting digits from a mantissa will basically go like this:
(assuming a zero exponent, we'll generalize to nonzero exponents later)

N = 1 << mantissaBitCount
m = mantissa | (1 << (mantissaBitCount + 1)) // add the silent '1' to the mantissa. this will require special handling for denormals

digit0 = ((m / N) * 1) % 10
digit1 = ((m / N) * 10) % 10;
digit2 = ((m / N) * 100) % 10;
digit3 = ((m / N) * 1000) % 10;
...
digitn = ((m / N) * 10^n) % 10;

removing the modulos and divs:
mb = mantissaBitCount

digit0 = (m >> mb);
m = 10 * (m - (digit0 << mb));
digit1 = (m >> mb);
m = 10 * (m - (digit1 << mb));
digit2 = (m >> mb);
m = 10 * (m - (digit2 << mb));
...
digitn = (m >> mb);
m = 10 * (m - (digitn << mb));

this method will give you the real sub-digits mentioned earlier, even when you've consumed all the mantissa bits.
there are many details and tweaks that can be added to maximise precision. I don't have enough time right now to dive into the implementation details, but the rough algo should be enough.

now. the part that will be problematic for your specific case, is just that you will have to work on two u64 to represent your 112-bit mantissa:

m = 10 * (m - (digit2 << mb));

the shifts are trivial, and you just need a 128-bit integer multiply (really, a "128-bit times 10" operator), and a 128-bit integer subtraction.

with these two tools you should be able to pull it off without too much trouble.

if you're concerned about speed, this is a pretty fast method, and should stay pretty fast even if you roll a custom (128-bit,10) mul, and 128 bit subtraction:

timings (in cycles) of sprintf() and this method:

sprintf custom format value
2517.3916 135.63612 %.f 42.123f
2669.46143 157.22189 %.3f 42.123f
2696.28394 181.10466 %.6f 42.123f
2842.25659 214.67934 %.10f 42.123f
2662.26904 132.54713 %.e 42.123f
2722.87842 153.83556 %.3e 42.123f
2827.36914 179.39142 %.6e 42.123f
2846.64575 215.57584 %.10e 42.123f
3239.98413 160.91776 %.3e 4.2123e-20f
2521.19873 151.42184 %.3e 4.2123e-3f
2472.59106 151.64955 %.3e 4.2123e-2f
2025.72388 152.0253 %.3e 4.2123e-1f
2701.29175 127.41381 %.3e 4.2123e+0f
2695.72241 152.35506 %.3e 4.2123e+1f
2716.10889 154.34775 %.3e 4.2123e+2f
3290.89038 160.06815 %.3e 4.2123e+20f



I don't know if anybody else uses something like this, but it works pretty well for me.

Share this post


Link to post
Share on other sites
Quote:
Original post by YogurtEmperor
iMalc: The link is handy, but for this project I need to use entirely custom coding. I will keep your link around for other projects in the future.


Why would a project require that you use entirely custom coding, regardless of the license granted with any existing piece of code and regardless of how suitable a piece of code is for your project? That doesn't make a lot of sense.

Share this post


Link to post
Share on other sites
Quote:
Original post by cache_hit
Why would a project require that you use entirely custom coding, regardless of the license granted with any existing piece of code and regardless of how suitable a piece of code is for your project? That doesn't make a lot of sense.
Generally, I would assume a homework project.

Share this post


Link to post
Share on other sites
momotte, I have actually already finished my implementation using large integers, but it is slow as hell.

My implementation produces output with much greater accuracy than sprintf(), but my timings (in ticks) are more like:
Mine: 960
sprintf: 29

Mine is over 30 times slower!

On the other hand, it is also significantly more accurate.
I can see why your method is so fast, but also less accurate.

I have to decide now which I want more.


Since my large-integer class has all the same operators as a regular integer, I can basically swap my method for yours using a lot of my existing code, then convert to compiler-defined types once it is working.



cache_hit: If you needed to know the reason behind every instance of a wheel being re-invented you would go mad.
Let’s just say that the standard C functions do not work the same on all platforms. Microsoft has extensions that do not exist on Linux/Macintosh, but which I need to use not only on Windows, Linux, and Macintosh, but also on PlayStation 3, Nintendo Wii, Xbox 360, and iPhone/iPod touch.
On the other hand, Macintosh has vsscanf() while Microsoft has nothing. Again, I need vsscanf(), and I need it to work exactly the same on all platforms I am supporting.

_vscprintf() exists on Windows, but only some versions of Windows. And I really need this one.

The list goes on and on.



swiftcoder: Last time I was in a school was when I was teaching C++ and game programming at some university.
Those were the days.

Share this post


Link to post
Share on other sites

This topic is 2841 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this