Sign in to follow this  
_Sauce_

implementing an arbitrary size integer class

Recommended Posts

I'm working on an integer class (C++) that can contain numbers of arbitrary size, ie. the programmer should be able to use numbers of infinite length, limited only by memory. I am aware of the existance of many classes which already handle this sort of stuff, but I am doing this as a programming exercise so those are pretty useless to me ;) Obviously I need to define the behaviour for all of the basic operators such as +, -, *, /, % and =, as well as the behaviour for equality operators (+=, -=, *=, /=, %=, ==), and finally conversion operators, etc... I will begin with operator+ as it's behaviour is relatively simple. My question is, what is the most efficient way to do this? Ideally I could store the number within the class as a vector of words and perform all my calculation in binary. Is there a better way? Are there any optimizations I can make when calculating in binary (like some sort of tricky binary bitshift magic?) I'm not too concerned with the other operators for the time-being - it would be nice to work on them one at a time and get them optimised before moving onto the rest. It's also likely that I'll have to come up with some sort of square root function... is there any way to calculate the square root of such large numbers, or would I have to rely on a lookup table? (virtually impossible for me to do)

Share this post


Link to post
Share on other sites
not really trying to optimise it, so much as making it easier to write... The algorithm for adding base10 numbers seems simple enough when you're adding numbers on paper, but actually implementing it is what escapes me. I'll keep at it.

Share this post


Link to post
Share on other sites
1) You don't need a big int class to solve the linked problem. There is a much, much easier way to do it. (Apologies if you already know that.)

2) To answer your more direct question, I would implement a software ALU. While I have not done it myself, you could consider keeping an array of 64-bit numbers. You'll have to do the carrying manually (i.e. add the low order pair of 64-bit numbers together.. if you detect an overflow manually add 1 to the next higher order 64-bit numbers. It may help to look up some very basic computer engineering stuff to see how an ALU works so that you can imitate it.. you'd use the hardware though to do 64 bits at a time.. certainly not bit-by-bit :)

Detecting the overflow is easy.. C = A + B. If C < A, you overflowed. Addition is the easy operation though. I'd suggest taking a look at how Melekor's linked code works.

3) Wikipedia has a page specifically about how to find square roots, but using Newton's Method is just dead simple. Your equation would be 0 = x^2 - Y where x is your answer and Y is the number you're taking the square root of. I'm sure an algorithm at http://en.wikipedia.org/wiki/Methods_of_computing_square_roots is much faster though (particularly the one using digit by digit work on binary numbers.)

BTW, don't try to think in base 10. Think in base 2 or 16. A + 9 = 14. That's an overflow to the next digit (the 1) and you have 4 left for the ones column. The next column is the 16s column, the next is the 256s column, etc. F00 is 15*16*16 + 0*16 + 0*1 = 3840

Share this post


Link to post
Share on other sites
The problem at projecteuler doesn't need explicit bignums, you can just use the high-school number adding algorithm to calculate one digit at a time. Since there are 100 numbers, and you work in base 10, the sum of a single column easily fits in 32 bits, and similarly since there are only 50 columns, carries won't get too large.

char * numbers[] =
{"37107287533902102798797998220837590246510135740250",
"46376937677490009712648124896970078050417018260538",
"74324986199524741059474233309513058123726617309629",
"91942213363574161572522430563301811072406154908250",
"23067588207539346171171980310421047513778063246676",
"89261670696623633820136378418383684178734361726757",
"28112879812849979408065481931592621691275889832738",
"44274228917432520321923589422876796487670272189318",
"47451445736001306439091167216856844588711603153276",
"70386486105843025439939619828917593665686757934951",
"62176457141856560629502157223196586755079324193331",
"64906352462741904929101432445813822663347944758178",
"92575867718337217661963751590579239728245598838407",
"58203565325359399008402633568948830189458628227828",
"80181199384826282014278194139940567587151170094390",
"35398664372827112653829987240784473053190104293586",
"86515506006295864861532075273371959191420517255829",
"71693888707715466499115593487603532921714970056938",
"54370070576826684624621495650076471787294438377604",
"53282654108756828443191190634694037855217779295145",
"36123272525000296071075082563815656710885258350721",
"45876576172410976447339110607218265236877223636045",
"17423706905851860660448207621209813287860733969412",
"81142660418086830619328460811191061556940512689692",
"51934325451728388641918047049293215058642563049483",
"62467221648435076201727918039944693004732956340691",
"15732444386908125794514089057706229429197107928209",
"55037687525678773091862540744969844508330393682126",
"18336384825330154686196124348767681297534375946515",
"80386287592878490201521685554828717201219257766954",
"78182833757993103614740356856449095527097864797581",
"16726320100436897842553539920931837441497806860984",
"48403098129077791799088218795327364475675590848030",
"87086987551392711854517078544161852424320693150332",
"59959406895756536782107074926966537676326235447210",
"69793950679652694742597709739166693763042633987085",
"41052684708299085211399427365734116182760315001271",
"65378607361501080857009149939512557028198746004375",
"35829035317434717326932123578154982629742552737307",
"94953759765105305946966067683156574377167401875275",
"88902802571733229619176668713819931811048770190271",
"25267680276078003013678680992525463401061632866526",
"36270218540497705585629946580636237993140746255962",
"24074486908231174977792365466257246923322810917141",
"91430288197103288597806669760892938638285025333403",
"34413065578016127815921815005561868836468420090470",
"23053081172816430487623791969842487255036638784583",
"11487696932154902810424020138335124462181441773470",
"63783299490636259666498587618221225225512486764533",
"67720186971698544312419572409913959008952310058822",
"95548255300263520781532296796249481641953868218774",
"76085327132285723110424803456124867697064507995236",
"37774242535411291684276865538926205024910326572967",
"23701913275725675285653248258265463092207058596522",
"29798860272258331913126375147341994889534765745501",
"18495701454879288984856827726077713721403798879715",
"38298203783031473527721580348144513491373226651381",
"34829543829199918180278916522431027392251122869539",
"40957953066405232632538044100059654939159879593635",
"29746152185502371307642255121183693803580388584903",
"41698116222072977186158236678424689157993532961922",
"62467957194401269043877107275048102390895523597457",
"23189706772547915061505504953922979530901129967519",
"86188088225875314529584099251203829009407770775672",
"11306739708304724483816533873502340845647058077308",
"82959174767140363198008187129011875491310547126581",
"97623331044818386269515456334926366572897563400500",
"42846280183517070527831839425882145521227251250327",
"55121603546981200581762165212827652751691296897789",
"32238195734329339946437501907836945765883352399886",
"75506164965184775180738168837861091527357929701337",
"62177842752192623401942399639168044983993173312731",
"32924185707147349566916674687634660915035914677504",
"99518671430235219628894890102423325116913619626622",
"73267460800591547471830798392868535206946944540724",
"76841822524674417161514036427982273348055556214818",
"97142617910342598647204516893989422179826088076852",
"87783646182799346313767754307809363333018982642090",
"10848802521674670883215120185883543223812876952786",
"71329612474782464538636993009049310363619763878039",
"62184073572399794223406235393808339651327408011116",
"66627891981488087797941876876144230030984490851411",
"60661826293682836764744779239180335110989069790714",
"85786944089552990653640447425576083659976645795096",
"66024396409905389607120198219976047599490197230297",
"64913982680032973156037120041377903785566085089252",
"16730939319872750275468906903707539413042652315011",
"94809377245048795150954100921645863754710598436791",
"78639167021187492431995700641917969777599028300699",
"15368713711936614952811305876380278410754449733078",
"40789923115535562561142322423255033685442488917353",
"44889911501440648020369068063960672322193204149535",
"41503128880339536053299340368006977710650566631954",
"81234880673210146739058568557934581403627822703280",
"82616570773948327592232845941706525094512325230608",
"22918802058777319719839450180888072429661980811197",
"77158542502016545090413245809786882778948721859617",
"72107838435069186155435662884062257473692284509516",
"20849603980134001723930671666823555245252804609722",
"53503534226472524250874054075591789781264330331690"};

#include <vector>
#include <cstdio>

int digit(int i, int j) {
return numbers[i][50-j-1] - '0';
}

void main()
{
std::vector<int> sum_digits;

int r = 0, q = 0;
for(int j = 0; j < 50 || q > 0; ++j) {
// pretend that we have as many extra columns of all zeros as needed
// to flush out all the leftover bits in our accumulator.
int column_sum = 0;
for(int i = 0; j < 50 && i < 100; ++i) {
column_sum += digit(i,j);
}
r = (column_sum+q)%10;
q = (column_sum+q)/10;
sum_digits.push_back(r);
}

printf("The first 10 digits are: ");
for(int i = 0; i < 10; ++i) {
printf("%d", sum_digits.back());
sum_digits.pop_back();
}
printf("\n");
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Melekor
The problem at projecteuler doesn't need explicit bignums, you can just use the high-school number adding algorithm to calculate one digit at a time. Since there are 100 numbers, and you work in base 10, the sum of a single column easily fits in 32 bits, and similarly since there are only 50 columns, carries won't get too large.

*** Source Snippet Removed ***

2 ^ 32 = 4294967296...

the largest datatype supported in visual studio is an unsigned long long or an __int64; 2 ^ 64 = 18446744073709551616. That's a far cry from 50 digits. What makes you think 32 bits is enough for 50 digits?

edit: nevermind, I misread your post. You meant that the sum of a column would fit in 32 bits, not the entire sum :)

Share this post


Link to post
Share on other sites
Lookign at the problem it's only asking for the first ten digits (ie: the most significant?). So if you do some sort of controlled overflow where you add in the next column and then divide out the least significant digits, you should (might?) be able to handle the sum with a supported data type.

Share this post


Link to post
Share on other sites
I have solved that problem using haskell that support big integers. Others have done the same thing. You can probably do it yourself if you like but I think there are others more interesting problems and the use of a library like GMP is acceptable.

Share this post


Link to post
Share on other sites
Quote:
Original post by apatriarca
I have solved that problem using haskell that support big integers. Others have done the same thing. You can probably do it yourself if you like but I think there are others more interesting problems and the use of a library like GMP is acceptable.

That would be a total cop-out. What's the point in even attempting the problem if you're just going to use an existing big-int library? Then all you have to do is find the first ten digits, and load the numbers from a file (if you wanted to make things neat)

Where is the challenge in that? I'm not so interested in project Euler for the mathematics side of things, more for the chance to flex my programming muscles.

Share this post


Link to post
Share on other sites
My favorite way of implementing big integer type classes is to not implement them at all. Instead, implement a polynomial class. This has much wider ranging implications, and after all, an integer in base 10 is simply a polynomial in x, with x=10. For example, the number 32484 = 3*10^4 + 2*10^3 + 4*10^2 + 8*10^1 + 4*10^0. the beauty of this is that every operation is the same, except that with "numbers in base k" all the digits have to be mod k. But this is easily solved once you've implemented polynomial arithmetic.

For example, let's try to multiply the numbers 37 and 64 in, say, base 13.

37 x 64 = (3*13^1 + 7*13^0) x (6*13^1 + 4*13^0)
= 18*13^2 + 30*13^1 + 28*13^0

Now we have digits that need to be normalized since they're too large. We can do this with modular arithmetic.

28 % 13 = 2, so

28*13^0
= 2*13^0 + 26*13^0
= 2*13^0 + 2*13^1

As you can see, you're essentially just "carrying the 2". Continuing in this way you end up with the number

1*13^3 + 7*13^2 + 6*13^1 + 2*13^0
= 1762

(I hope I didn't make any stupid mistakes).

So basically you write a polynomial class, implement addition, subtraction, multiplication, and division (division is a little harder admittedly), and decide how to implement normalization (perhaps as a separate member function such as 'void normalize(int base)' is the most appropriate since this is, after all, a generic integer class.

One side-benefit of this approach is that if you store your coefficients in an std::vector or otherwise contiguous array of memory, you are set up perfectly to use the SIMD integer instructions such as PMULHUW etc.

I don't know that this is necessarily the -fastest- approach, but it's the coolest (IMO)


BTW, since this is for Project Euler, I can tell you that should you get further along in the problem set, a lot of the problems there will benefit from having a generic polynomial class handy. It's good to build up a project_euler library while you work on the problems, and you'll find that as you progress you'll use lots of the stuff you're previously written time and time again.

Share this post


Link to post
Share on other sites
Quote:
Original post by _Sauce_
Where is the challenge in that? I'm not so interested in project Euler for the mathematics side of things, more for the chance to flex my programming muscles.

I started project euler to learn haskell better and in that case there was no challenge at all with that language. A lot of other languages have similar built-in data type. I think it's important to learn to use the right tools in programming and in that case C++ isn't probably the best language to use.
I think you should try to solve quickly the first few problems and start solving the harder and more interesting ones.

Share this post


Link to post
Share on other sites
Quote:
Original post by apatriarca
Quote:
Original post by _Sauce_
Where is the challenge in that? I'm not so interested in project Euler for the mathematics side of things, more for the chance to flex my programming muscles.

I started project euler to learn haskell better and in that case there was no challenge at all with that language. A lot of other languages have similar built-in data type. I think it's important to learn to use the right tools in programming and in that case C++ isn't probably the best language to use.
I think you should try to solve quickly the first few problems and start solving the harder and more interesting ones.


Well I figured I would do them in order of ID (I know they aren't necessarily arranged in order of difficulty) and I have completed the first 12 problems. It's just this one that has me stuck.

Thanks for the tip on the polynomial stuff, cache_hit. I'll analyse your post in a bit greater detail when I haven't had anything to drink ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by cache_hit
My favorite way of implementing big integer type classes is to not implement them at all. Instead, implement a polynomial class. [...] I don't know that this is necessarily the -fastest- approach, but it's the coolest (IMO)


...and since multiplication of polynomials is convolution of their coefficients, then by the convolution theorem you can do it in O(n log n) time using the FFT. :-)

(I should add that although this is super-cool, I think I've read that in practice the Karatsuba algorithm is faster.)

Share this post


Link to post
Share on other sites
Suppose you break up the numbers into the first 10 digits, then 3 more digits, then the rest:

3710728753 390 2102...
4637693767 749 0009...
7432498619 952 4741...

It seems to me that the "error" from the smaller bits is limited in how much effect it can have. If you add up only the first 13 digits, then the worst-case scenario is if the other 47 digits are all 9. ie:


3710728753 390 2102... (first number in the sequence for reference)
0000000000 000 9999...*100 is less than
0000000000 100 0000...


The only way for this effect to propagate into the first 10 digits is if the 3-digit sum of the middle column has a 9 in the hundreds decimal place, so that 9 will add with the 100 000 and flip a higher decimal place. In other words, check if

390 + 749 + 952 + ... = xx9xx.

Now this is unlikely (1/9 chance), but it could still happen. In that case, instead of summing the first 13 digits you could sum the first 14, then the same argument applies, except shifted over by 1, so you would need to do another 3 digit sum to check that

902 + 490 + 524 + ... =/= xx9xx

If that fails, just keep moving over by 1 digit until you find a block of 3 whose sum doesn't have a 9 in the hundreds place.

Share this post


Link to post
Share on other sites
So you consider it cheating to use the big num class to solve this problem, but don't consider it cheating to go online and ask a bunch of people how to solve the problem?

Not sure if you realize it, but being able to assimilate a large software tool like GMP bigint into your project and effectively use it to solve a real-world problem (or even a BS one like the one on Project Euler), is not cheating. Its called software engineering and its a career.

Nobody writes all their computer code in a vacuum. Being able to integrate other people's software libraries/classes/tools into your projects and reach your own goals is the mark of a true professional. I vote that you take this opportunity to integrate GMP's bignum classes into your code as a learning experience, not as cheating.

Share this post


Link to post
Share on other sites
I'm not at all asking for a solution, Steve_Segrato. I'm looking for tips.

Quote:
Original post by Steve_Segreto

Not sure if you realize it, but being able to assimilate a large software tool like GMP bigint into your project and effectively use it to solve a real-world problem (or even a BS one like the one on Project Euler), is not cheating. Its called software engineering and its a career.

I'm aware of this, however I don't believe doing such a thing is in the spirit of the problem presented on Project Euler. In the real world I would happily go ahead and use an existing bigInt library (unless of course, I already had my own one which was better suited to the task at hand - unlikely), but for a learning exercise, I think it's perfectly acceptable to want to attempt to replicate the functionality of such a class. I know you don't have to know how a class works in order to use it well, but it's certainly nice to know how it works.

Share this post


Link to post
Share on other sites
Once you get to the later ProjectEuler problems, you'll definitely not want to use C++ (unless you happen to be a masochist).

Anyways, did you figure it out yet or is there still something you need help with?

Share this post


Link to post
Share on other sites
I've actually been working on some uni stuff, so I haven't had time to come back to this yet. I'll resurrect the thread when I've had another go at it. Feel free to continue discussing :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Melekor
Once you get to the later ProjectEuler problems, you'll definitely not want to use C++ (unless you happen to be a masochist).

I have successfully used C++ to solve several later ProjectEuler problems. There is no reason to use a single language for all the ProjectEuler problems. I think I have solved about 150 problems and used Haskell, C++, Python, Maple, PARI/GP. There are problems easier in a particular language and others in another language.

Share this post


Link to post
Share on other sites
Yea, as long as you have a solid grasp of dynamic programming, you can use most languages. I remember someone posted a solution to one of the later problems somewhere in the 200s that took me a realy long time to solve and my solution ended up being quite complex. Their solution was 5 lines in C++ using a brilliant recursive function.

Share this post


Link to post
Share on other sites
Quote:
Original post by apatriarca
Quote:
Original post by Melekor
Once you get to the later ProjectEuler problems, you'll definitely not want to use C++ (unless you happen to be a masochist).

I have successfully used C++ to solve several later ProjectEuler problems. There is no reason to use a single language for all the ProjectEuler problems. I think I have solved about 150 problems and used Haskell, C++, Python, Maple, PARI/GP. There are problems easier in a particular language and others in another language.


I didn't say it wasn't possible to do it, only that you'd have to be a masochist to want to :-)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this