DISCLAIMER: this is a massive post and I probably made several computational errors along the way - bear with me.EDIT: god damn it, why does the latex "times" command get mangled by the website?----
I don't think Muzzy wants a floating-point version but rather an arbitrary precision integer arithmetic version. Well let's see - how are numbers stored when we write them down in everyday life? Say 25 - that's a 2, representing the number of "tens", and a 5 representing the number of "units". That's nice, but how do computers do it? Surely they don't use decimal (base 10) notation, do they? No, they don't, they use binary notation which is more convenient for them to process. In binary, each digit represents whether a specific power of two is a "part" of the number being represented. So basically, binary notation represents an arbitrary number as a sum of powers of two! For 25 we get:
Which gives the base 256 notation for 441 of "1:185" (I use colons to delimit the "digits" here since there are 256 possible values per digit, more than we have base 10 digits available - note this is a logistical problem rather than a mathematical one).
Of course, this sucks - 441 could be represented by any CPU. So let's try a larger number, say 4984994854993448178545902948759435848758439853487834939846735869. I'll spare you the working (there is an efficient algorithm to change bases, I'll let you look it up), but the base 256 notation for this massive integer is:
This is more manageable - each digit can be individually handled by a CPU, and a formula for addition with digit carry can be efficiently done. Similarly, subtraction, multiplication, and all other operations can be done. But this could be made better - a byte is quite wasteful considering CPU's can work with 32 or 64 bits at a time. What does it give in base 2^32 instead of base 2^8 (base 256)? We get:
Much better. Now let's take another huge number, and add the two together, as an example. So let's say 193849494855949383389292884 is our second number. Its form in base 2^32 is:
Now the least significant digit is the last one, so that's where we want to start to do addition (remember primary school addition?). So let's take the last digit of both numbers, and add them together:
2234169341 + 2470062420 = 4704231761
Now remember primary school addition again, if you add, say, 7 and 8 together, you get 15, so you keep 5 and you carry 1, right? Same here - if the result is bigger than the maximum digit allowed (which is 2^32 - 1), we simply take the remainder as the resulting digit and carry 1. And 2^32 - 1 is 4294967295. Our result was 4704231761, which is bigger. So we remember to carry 1
on the next digit addition, and use the remainder as the final digit, which is 4704231761 - 4294967296 = 409264465
Ok, next digits are 1015093636 and 742607992. We add them together, and we don't forget the carry:
1015093636 + 742607992 + 1
This is smaller than our maximum digit 4294967295, so we just use that as the final digit and we have no carry for the next digit.
Next up are 339451501 and 10508602. Add them up, with no carry:
339451501 + 10508602 + 0
Smaller than the maximum digit, use that as the final digit. Now we have no digits left in the second number, and we have no carry left, so we can just copy the remaining digits of the first number. You could do it manually by padding the second number with leading zero digits and working through it, if you wanted. So the sum is:
This is the correct sum, and you can convert it back to base 10 to display it for a human to read if you wish.
That's basically it. That's how GMP, Java's bignum, or any other such library works. They work by defining arithmetic on a specific base that the CPU is comfortable with (the best choice is almost always the CPU's native word size, e.g. 32 bits or 64 bits for 32-bit and 64-bit CPU's respectively), and doing math on each "digit" in that base individually (which the CPU can do). All imaginable operations (addition, division, multiplication, modulus, integer square root, etc...) can be implemented in this way, for numbers whose sizes are bounded only by the available memory in the system (to represent it in memory) instead of the CPU register size.
Of course the point is to make it work FAST, since now the CPU will handle individual digits instead of the whole number in a single shot, so performance will suffer as numbers get bigger and bigger. You can look up the GMP documentation for more information - I think they call the digits "limbs".
I hope this made sense
PS: in the addition example, if your CPU was 32-bit, it would seem that you wouldn't be able to know if the sum of two individual digits would exceed the maximum digit size 2^32 - 1, as the sum would naturally wrap around to zero because of the CPU's limited register size. How do you handle this situation? There is a trivial solution, which just involves checking the CPU's overflow flag - if it is set, then the operation overflowed, and you get the remainder for free too. There is also a comparison-based solution, but it only works for addition/subtraction, not for multiplication.
Edited by Bacterius, 09 June 2012 - 04:59 AM.