Extremely Large numbers in C++

Started by
9 comments, last by Narf the Mouse 11 years, 10 months ago
Hey I want to make a datatype for myself that can hold very large numbers like a 'Googol' or maybe even a little bigger.

a Googol is 10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000


I want a datatype that can hold this number, and be able to perform mathematical operations on it. and possibly if it's not to hard, have decimal values as well that go 100 digits below 0


basically I want to make a datatype that has at least 512 bits(64 bytes), or larger if need be.


the only way i've thought of doing this is to make an array of char's and then calculate where in the array each number would go(the number 35 would be arry[1] = 3 arry[0] = 5). But i realize that would very bad on the cpu plus very complex.

I don't want any libraries that are out there, I'm looking for a way to do this myself.

Does anyone know of a way to do this?
Advertisement
It'd be easier to just say 10^100.. save you the counting of zeros ;)

A standard 64bit datatype that covers this number already exists. It's called a double! It'll give you numbers up to something around 2.2*10^308 and down to 2.2*10^-308 with precision of 15-17 digits.

But, if you want to cover 10^100 + 1 with exact precision, then you'll need a heck of a big data type.. and its beyond me how you'd do it.

I suggest a bit of reading on wikipedia:
http://en.wikipedia....ng-point_format
http://en.wikipedia....mputer_science)

And the recent altdevblog series on float:
http://www.altdevblo...-of-odd-floats/ (this article has links to all the previous ones, start with the first one )
thanks, i'll look into those, but if I can't figure anything out within the next couple of days i'll probably be back here tongue.png

and yeah I'm look for exact precision.
You can take a look at the GMP library. It is written in C, but there are wrappers for several languages, including C++ (you can clearly also use directly the C API). It is used in many computer algebra systems like Mathematica and Maple, in computational geometry libraries like CGAL, in some programming languages supporting big numbers like haskell and so on..
I believe you would make it by having an array of ints of some size, and do the operations 1 int pair at a time like a1[0]+a2[0], and that should give you the result and an overflow, and somehow you put the overflow somewhere so that it gets to the calculation between the next pair of ints or something like that.

You can also google for big number libraries.

o3o

DISCLAIMER: this is a massive post and I probably made several computational errors along the way - bear with me.
EDIT: god damn it, why does the latex "times" command get mangled by the website?
----

I don't think Muzzy wants a floating-point version but rather an arbitrary precision integer arithmetic version. Well let's see - how are numbers stored when we write them down in everyday life? Say 25 - that's a 2, representing the number of "tens", and a 5 representing the number of "units". That's nice, but how do computers do it? Surely they don't use decimal (base 10) notation, do they? No, they don't, they use binary notation which is more convenient for them to process. In binary, each digit represents whether a specific power of two is a "part" of the number being represented. So basically, binary notation represents an arbitrary number as a sum of powers of two! For 25 we get:

[eqn]25 = 2^0 + 2^3 + 2^4[/eqn]

To make it more explicit we could write it as this:

[eqn]25 = 1 \times 2^0 + 0 \times 2^1 + 0 \times 2^2 + 1 \times 2^3 + 1 \times 2^4 + 0 \times 2^5 + 0 \times \cdots[/eqn]

Clearly when viewed in this way, the exponents will be the same regardless of the number being represented, so we can remove them as they are implied by binary notation. So we are left with the string "10011000....". By convention, this string is actually written backwards, so "....00011001". And the leading zeroes can be removed (just as in decimal notation), so we are left with "11001". This is binary for 25!

You will notice a similarity with decimal notation. Take the 25 example again, we can write it as:

[eqn]25 = 5 \times 10^0 + 2 \times 10^1 + 0 \times 10^2 + 0 \times \cdots[/eqn]

As in binary, the exponents are implied by decimal notation, so we are left with "5200...", written backwards "...0025", and with leading zeroes removed, we get "25", which is base 10 notation for 25 (obviously this sounds weird, this is because humans assume base 10 notation when they talk, so they don't specifically state it usually). This pattern works for any integer base you could think of, like base 17 and base 3441 (and it's also possible to have real bases, but I won't mention those as they are not relevant to this question). Except base 0 (which is meaningless) and base 1 (which can't work for obvious reasons). More info here.

Now how does this relate to the issue at hand? We could just keep adding higher and higher exponents to reach higher and higher numbers, right? True, but CPU registers have a practical limit on the number of bits they can handle simultaneously for operations such as addition. This is 32 and 64 bits respectively for a 32-bit and 64-bit processor. So they can't directly do math with bigger numbers using the native operators available.

So clearly binary is a no-go for big numbers. But hey, this means numbers are represented, inside the CPU, as an array of bits, right? And we know that in software, we can make arrays of bytes, of words, or longwords (32-bit integers), or even quadwords (64-bit) if the CPU supports it, right! Well, what's keeping us from representing numbers in base 256 (for bytes), or base 65536, in software? The math is sound, and we could implement our own addition/multiplication/etc... functions, right?

Well, let's try it with bytes. Represent, say, 441 in base 256:

[eqn]441 = 185 \times 256^0 + 1 \times 256^1 + 0 \times 256^2 + 0 \times \cdots[/eqn]

Which gives the base 256 notation for 441 of "1:185" (I use colons to delimit the "digits" here since there are 256 possible values per digit, more than we have base 10 digits available - note this is a logistical problem rather than a mathematical one).

Of course, this sucks - 441 could be represented by any CPU. So let's try a larger number, say 4984994854993448178545902948759435848758439853487834939846735869. I'll spare you the working (there is an efficient algorithm to change bases, I'll let you look it up), but the base 256 notation for this massive integer is:

12:30:43:127:128:97:166:128:64:235:249:121:134:81:206:20:59:158:109:60:129:25:132:133:42:183:253

This is more manageable - each digit can be individually handled by a CPU, and a formula for addition with digit carry can be efficiently done. Similarly, subtraction, multiplication, and all other operations can be done. But this could be made better - a byte is quite wasteful considering CPU's can work with 32 or 64 bits at a time. What does it give in base 2^32 instead of base 2^8 (base 256)? We get:

794155:2139120038:2151738361:2038845902:339451501:1015093636:2234169341

Much better. Now let's take another huge number, and add the two together, as an example. So let's say 193849494855949383389292884 is our second number. Its form in base 2^32 is:

10508602:742607992:2470062420

Now the least significant digit is the last one, so that's where we want to start to do addition (remember primary school addition?). So let's take the last digit of both numbers, and add them together:

2234169341 + 2470062420 = 4704231761

Now remember primary school addition again, if you add, say, 7 and 8 together, you get 15, so you keep 5 and you carry 1, right? Same here - if the result is bigger than the maximum digit allowed (which is 2^32 - 1), we simply take the remainder as the resulting digit and carry 1. And 2^32 - 1 is 4294967295. Our result was 4704231761, which is bigger. So we remember to carry 1 on the next digit addition, and use the remainder as the final digit, which is 4704231761 - 4294967296 = [color=#ff0000]409264465.

Ok, next digits are 1015093636 and 742607992. We add them together, and we don't forget the carry:

1015093636 + 742607992 + 1 = [color=#ff0000]1757701629

This is smaller than our maximum digit 4294967295, so we just use that as the final digit and we have no carry for the next digit.

Next up are 339451501 and 10508602. Add them up, with no carry:

339451501 + 10508602 + 0 = [color=#ff0000]349960103

Smaller than the maximum digit, use that as the final digit. Now we have no digits left in the second number, and we have no carry left, so we can just copy the remaining digits of the first number. You could do it manually by padding the second number with leading zero digits and working through it, if you wanted. So the sum is:

794155:2139120038:2151738361:2038845902:[color=#ff0000]349960103:[color=#ff0000]1757701629:[color=#ff0000]409264465

This is the correct sum, and you can convert it back to base 10 to display it for a human to read if you wish.

That's basically it. That's how GMP, Java's bignum, or any other such library works. They work by defining arithmetic on a specific base that the CPU is comfortable with (the best choice is almost always the CPU's native word size, e.g. 32 bits or 64 bits for 32-bit and 64-bit CPU's respectively), and doing math on each "digit" in that base individually (which the CPU can do). All imaginable operations (addition, division, multiplication, modulus, integer square root, etc...) can be implemented in this way, for numbers whose sizes are bounded only by the available memory in the system (to represent it in memory) instead of the CPU register size.

Of course the point is to make it work FAST, since now the CPU will handle individual digits instead of the whole number in a single shot, so performance will suffer as numbers get bigger and bigger. You can look up the GMP documentation for more information - I think they call the digits "limbs".

I hope this made sense smile.png

PS: in the addition example, if your CPU was 32-bit, it would seem that you wouldn't be able to know if the sum of two individual digits would exceed the maximum digit size 2^32 - 1, as the sum would naturally wrap around to zero because of the CPU's limited register size. How do you handle this situation? There is a trivial solution, which just involves checking the CPU's overflow flag - if it is set, then the operation overflowed, and you get the remainder for free too. There is also a comparison-based solution, but it only works for addition/subtraction, not for multiplication.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

It might be trivial, but Google is not useful for finding out about carry flags - At least not for me.
hey bacterious, i'm going to go ahead and say. "I love you man!". You just knocked out several several hours of troubleshooting for me in that one post. I can't thank you enough for that. Now teach me some quantum physics =). just playin llol. Thanks alot for your help, and you should be a teacher at some college, cause you're good. ;)

It might be trivial, but Google is not useful for finding out about carry flags - At least not for me.

Sorry about my use of the word trivial. I was mostly using it in saying that the algorithm isn't complicated (it's just a bit lookup in a CPU register, instead of an elaborate procedure to detect overflow in a sum). Of course knowing about it isn't trivial, I wasn't being condescending smile.png And yes I agree, I noticed "carry flag" was not a good term to use in google, I don't know why! It might be too short or something.



hey bacterious, i'm going to go ahead and say. "I love you man!". You just knocked out several several hours of troubleshooting for me in that one post. I can't thank you enough for that. Now teach me some quantum physics =). just playin llol. Thanks alot for your help, and you should be a teacher at some college, cause you're good. ;)

Thanks! But I'm still a student so becoming a teacher would be quite impossible at the moment tongue.png

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

At least when working with x86 assembly, there are the ADC and SBB instructions implementing respectively addition with carry and subtraction with borrow. These instructions greatly simplify the implementation of addition and subtraction between large numbers. You clearly have to write some assembly to use them though. This is probably one of the few cases where it is easier to write the code in assembly than C/C++. Multiplication is more complicated and there exists several different algorithms to do it as you can see here.

This topic is closed to new replies.

Advertisement