Started by Oct 28 2004 08:35 PM

,
20 replies to this topic

Posted 28 October 2004 - 08:35 PM

Ok, i've been building a small integer factorizer.
It is currenly very slow.
It takes 2.8 seconds to factorize 102,812,541,257 into 53 * 6,199 * 312,931.
It is currently using a few things,
First, it checks the common divisors (2,3,5,7 and 11). if it is divisible by those, then it has the factors almost instantly.
Then it sqrt's the number. if it is < 200 then it does a trial division search (all numbers < 200, takes < 1 second)
And finally, if it is > 200, then it uses a modified sieve of Eratosthenes.
Basically, it checks n. if n isn't a factor, then 2n, 3n, 4n, 5n, on. will not be a factor, so you can rule them off your list.
It is quite slow for big numbers, but spends most of its time removing all the common factors (2, 3, 5,7 and 11) and i don't know how to get rid of them beforehand.
My question is: What is a method, which is faster then my sieve, and which is not too hard (without explanation, some things i just don't understand yet...), to understand or code.
From,
Nice coder

Posted 29 October 2004 - 05:12 AM

That is actually the sieve of Eratosthenese[edit can't spell]..;)

To make it as fast as possible, implement an Extended Euclidean Algorithm. That is about as fast as you'll find. It's also what is used for a vast majority of encryption/decryption techniques, so you*know* it is fast.

You will have to adapt it to your situation, though.

Here's a link for javascript implementation:

http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm

Here's a link with a more algorithmic-orientation:

http://www-math.cudenver.edu/~wcherowi/courses/m5410/exeucalg.html

Hope this helps ya.

To make it as fast as possible, implement an Extended Euclidean Algorithm. That is about as fast as you'll find. It's also what is used for a vast majority of encryption/decryption techniques, so you

You will have to adapt it to your situation, though.

Here's a link for javascript implementation:

http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm

Here's a link with a more algorithmic-orientation:

http://www-math.cudenver.edu/~wcherowi/courses/m5410/exeucalg.html

Hope this helps ya.

Posted 29 October 2004 - 07:23 PM

Quote:

Original post by Alex Swinney

That is actually the sieve of Eratosthenese[edit can't spell]..;)

To make it as fast as possible, implement an Extended Euclidean Algorithm. That is about as fast as you'll find. It's also what is used for a vast majority of encryption/decryption techniques, so you know it is fast.

You will have to adapt it to your situation, though.

Here's a link for javascript implementation:

http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm

Here's a link with a more algorithmic-orientation:

http://www-math.cudenver.edu/~wcherowi/courses/m5410/exeucalg.html

Hope this helps ya.

Thanks, it does help me a bit :)

I'm limited to 100 trillion (15 digits before vb stuffs up the numbering system.) so, any algorith which isn't too slow should work.

What i was wondering about, tho is how to use that to get the factors...

I also wouldn't mind dumping the seive, for something a bit more elegent (my sieve is a little hacky... well, maybe more then a little).

From,

Nice coder

Posted 29 October 2004 - 08:52 PM

? i was logged in before...

Above post is mine

From,

Nice coder

Above post is mine

From,

Nice coder

Posted 30 October 2004 - 06:29 AM

You can get rid of them by taking their product and checking +/-1 and +/- all the primes greater than the highest prime in the product and less than half the product. To eliminate all multiples of the first five primes takes a list of 184 primes. So for example say your product was 2 then you check 2n+1, for 2*3 you check 6n+/-1, for 2*3*5 you check 30n+/-{1,7,11,13}, for 2*3*5*7 you check 210n+/-{1,11,13,...,97,101,103).

Posted 30 October 2004 - 04:00 PM

Quote:

Original post by LilBudyWizer

You can get rid of them by taking their product and checking +/-1 and +/- all the primes greater than the highest prime in the product and less than half the product. To eliminate all multiples of the first five primes takes a list of 184 primes. So for example say your product was 2 then you check 2n+1, for 2*3 you check 6n+/-1, for 2*3*5 you check 30n+/-{1,7,11,13}, for 2*3*5*7 you check 210n+/-{1,11,13,...,97,101,103).

Thank you!

Now if only i could find a way to implement one of the faster approaches... (so that i wouldn' need to do this at all)

From,

Nice coder

Posted 30 October 2004 - 04:24 PM

This should do the trick.

#include <stdlib.h> #include <iostream.h> int main() { __int64 number, j, k; short int again; system("cls"); cout << "\n\n\t\t\t Prime Finder\n\n"; cout << "\n\n\t\t\t By: Jared Stewart.\n\n\n\n\n\t\t "; system("PAUSE"); system("cls"); do { system("cls"); cout << "Please enter a number to get the prime factors: "; cin >> number; j = 2; system("cls"); cout << "The prime factors are: "; do { if (number == 2) break; if (number % j == 0 ) //check if its a factor { cout << j << " "; number = number/j; //make the number to check smaller } else j++; } while (j < 3); for (j =3; j <= sqrt(number); ) { if (number % j == 0 ) //check if its a factor { cout << j << " "; number = number/j; //make the number to check smaller } else j+=2; } cout << number; cout << "\n\n"; cout << "To test a new number, press 1. To quit, press 2.\n\n\n"; cin >> again; } while (again ==1); }

Posted 31 October 2004 - 05:24 PM

Quote:

Original post by Jareds411

This should do the trick.

Trial division,

very slow!

ok, i got rid of my sieve (its slower then trial division!).

And i replaced it with very simple trial division

Dim sn As Double

Dim i As Double

Dim j As Double

Dim fin As Double

Dim t As Double

Dim k As Double

Dim out(1) As Double

sn = Int(Sqr(n)) + 1

fin = Int(sn / 2) + 1

i = 12

j = sn + 1

Do

j = j - 1

i = i + 1

k = number / i

t = number / j

If Int(k) = k Then

out(0) = k

out(1) = i

factorize = out

Exit Function

End If

If Int(t) = t Then

out(0) = t

out(1) = j

factorize = out

Exit Function

End If

DoEvents

Loop Until j < fin

out(0) = 1

out(1) = number

factorize = out

Very simple, not very effective (but it still finds factors that are close to 13 and close to sqr(n), quite fast.

Is there any faster way?

From,

Nice coder

Posted 31 October 2004 - 08:59 PM

There are faster ways, but they involve fun and games with number theory. An example is Quadratic Sieve.

Posted 31 October 2004 - 09:29 PM

!!!!!

Thats a little... (ok way) over my head.

Ok, Now a few question about that link...

First, what is p?

Second, How Do you solve the congruencies on the factor base?

What is Dixons factorization?

What is Gausian elemination?

The quadratic prime sieve is nice...

But the factorization sieve is very complicated.

From,

Nice coder

Thats a little... (ok way) over my head.

Ok, Now a few question about that link...

First, what is p?

Second, How Do you solve the congruencies on the factor base?

What is Dixons factorization?

What is Gausian elemination?

The quadratic prime sieve is nice...

But the factorization sieve is very complicated.

From,

Nice coder

Posted 01 November 2004 - 03:08 AM

If you can afford having heuristic algorithm, there is a lot of great stuff (I mean really reliable):

- Rabbin-Miller's test - to check if the number(n) is prime:

the entry can be any small number(d), and the chance to discover, that 'n' is prime is 1/2. Not much, but you can repeat the test several times with different 'd' as needed. My teachers were joking, that the probability of the processor to mistake is larger than this algorithm to fail (and it's true).

But the usage of this for factorization is a thing to rethink...

- Rho-heuristic - to find a divisor of n. While common sieve takes O(n^(1/2)), the Rho takes up to O(d^(1/2)), where d is a smallest divisor of n (so at worse O(n^(1/4)) ).

But it may work forever with a small probability, so when it works too long, just stop it and try something else...

Search for them, or look up in 'well known' Thomas Cormen's "Introduction to algorithms"

/def

- Rabbin-Miller's test - to check if the number(n) is prime:

the entry can be any small number(d), and the chance to discover, that 'n' is prime is 1/2. Not much, but you can repeat the test several times with different 'd' as needed. My teachers were joking, that the probability of the processor to mistake is larger than this algorithm to fail (and it's true).

But the usage of this for factorization is a thing to rethink...

- Rho-heuristic - to find a divisor of n. While common sieve takes O(n^(1/2)), the Rho takes up to O(d^(1/2)), where d is a smallest divisor of n (so at worse O(n^(1/4)) ).

But it may work forever with a small probability, so when it works too long, just stop it and try something else...

Search for them, or look up in 'well known' Thomas Cormen's "Introduction to algorithms"

/def

Posted 01 November 2004 - 08:50 AM

The three top factorization algorithms are (in order of goodness):

The General Number Field Sieve

The Elliptic Curve Method

The Quadratic Sieve

The GNFS is the best overall algorithm, whereas ECM is better than GNFS only for log-base10(n) <= ~80 which makes GNFS more useful in practical applications (such as factoring 'n' in RSA). The ECM is also much easier to implement if you're not terribly great with number theory.

The General Number Field Sieve

The Elliptic Curve Method

The Quadratic Sieve

The GNFS is the best overall algorithm, whereas ECM is better than GNFS only for log-base10(n) <= ~80 which makes GNFS more useful in practical applications (such as factoring 'n' in RSA). The ECM is also much easier to implement if you're not terribly great with number theory.

Posted 01 November 2004 - 07:54 PM

Quote:

Original post by deffer

If you can afford having heuristic algorithm, there is a lot of great stuff (I mean really reliable):

- Rabbin-Miller's test - to check if the number(n) is prime:

the entry can be any small number(d), and the chance to discover, that 'n' is prime is 1/2. Not much, but you can repeat the test several times with different 'd' as needed. My teachers were joking, that the probability of the processor to mistake is larger than this algorithm to fail (and it's true).

But the usage of this for factorization is a thing to rethink...

- Rho-heuristic - to find a divisor of n. While common sieve takes O(n^(1/2)), the Rho takes up to O(d^(1/2)), where d is a smallest divisor of n (so at worse O(n^(1/4)) ).

But it may work forever with a small probability, so when it works too long, just stop it and try something else...

Search for them, or look up in 'well known' Thomas Cormen's "Introduction to algorithms"

/def

Ok, i've heard of the Rabbin Millers test (but don't know how to do it, i guess i'll have to [google])

The rho-heuristic is new to me, would you mind explaining more?

jperalta:

How exactly does the eleptic curve factorization work? (i'm limited to 100 trillion, so i guess it'll be one of the best for that).

From,

Nice coder

Posted 01 November 2004 - 08:20 PM

Quote:

Original post by Nice Coder

The rho-heuristic is new to me, would you mind explaining more?

It's Rho-Pollard's heuristic in fact.

I don't remember the details well (so gog. as always), but the general idea is, that 'n' and it's divisor 'd' show some common properties, while taking some numbers modulo 'n'.

If you take (almost) any starting number and start iterating it with the_function_I_do_not_remember_at_the_moment (which is as simple as x^2, btw), modulo n, then it will sooner or later fall into a loop. Then you will know, from the numbers from that loop, what the divisor you were looking for is.

\\As far as I remember(correct me!):

While prooving the method's functionality, we show, that if we knew 'd', and were taking the same numbers, but modulo d, we would fall into a loop in the same time.

As you can see, I am unable to give you precise answer without any sources, that I do not have anywhere near me. But I am pretty sure it is all in Cormen.

/def

Posted 01 November 2004 - 09:26 PM

Quote:

Original post by defferQuote:

Original post by Nice Coder

The rho-heuristic is new to me, would you mind explaining more?

It's Rho-Pollard's heuristic in fact.

I don't remember the details well (so gog. as always), but the general idea is, that 'n' and it's divisor 'd' show some common properties, while taking some numbers modulo 'n'.

If you take (almost) any starting number and start iterating it with the_function_I_do_not_remember_at_the_moment (which is as simple as x^2, btw), modulo n, then it will sooner or later fall into a loop. Then you will know, from the numbers from that loop, what the divisor you were looking for is.

\\As far as I remember(correct me!):

While prooving the method's functionality, we show, that if we knew 'd', and were taking the same numbers, but modulo d, we would fall into a loop in the same time.

As you can see, I am unable to give you precise answer without any sources, that I do not have anywhere near me. But I am pretty sure it is all in Cormen.

/def

ok.

I found This And This

From google.

I still don't understand it much, is there any chance of finding an algarithm? (in psudocode, basic, or just a walk-through)

From,

Nice coder

[Edited by - Nice Coder on November 2, 2004 4:26:26 AM]

Posted 02 November 2004 - 05:23 AM

Quote:

Original post by Nice Coder

How exactly does the eleptic curve factorization work? (i'm limited to 100 trillion, so i guess it'll be one of the best for that).

This is how I did it (in C#):

// Elliptic curve factorization methods

/// <summary>

/// This class describes a point on an elliptic curve that is described

/// in the form y^2 congruent to x^3 + ax + b (mod n)

/// </summary>

public class EllipticPoint

{

public BigInteger x;

public BigInteger y;

public BigInteger a;

public BigInteger b;

public BigInteger modulus;

public EllipticPoint(EllipticPoint p)

{

x = p.x;

y = p.y;

a = p.a;

b = p.b;

modulus = p.modulus;

}

public EllipticPoint()

{

x = new BigInteger();

y = new BigInteger();

a = new BigInteger();

b = new BigInteger();

modulus = new BigInteger();

}

public static EllipticPoint operator +(EllipticPoint p1, EllipticPoint p2)

{

if ((p1.modulus != p2.modulus) || (p1.a != p2.a) || (p1.b != p2.b))

throw new Exception("P1 and P2 must be defined on the same elliptic curve.");

EllipticPoint p3 = new EllipticPoint();

p3.a = p1.a;

p3.b = p1.b;

p3.modulus = p1.modulus;

BigInteger dy = p2.y - p1.y;

BigInteger dx = p2.x - p1.x;

if (dx < 0)

dx += p1.modulus;

if (dy < 0)

dy += p1.modulus;

if (GCD(dx, p1.modulus) != 1)

{

p3.a = GCD(dx, p1.modulus);

p3.modulus = -1;

return p3;

}

BigInteger m = (dy * dx.modInverse(p1.modulus)) % p1.modulus;

if (m < 0)

m += p1.modulus;

p3.x = (Power(m, 2) - p1.x - p2.x) % p1.modulus;

p3.y = m * (p1.x - p3.x) - p1.y % p1.modulus;

if (p3.x < 0)

p3.x += p1.modulus;

if (p3.y < 0)

p3.y += p1.modulus;

return p3;

}

public static EllipticPoint operator -(EllipticPoint p)

{

EllipticPoint p2 = new EllipticPoint(p);

p2.x = p.x;

p2.y = -p.y;

return p2;

}

public static EllipticPoint operator -(EllipticPoint p1, EllipticPoint p2)

{

EllipticPoint p3 = new EllipticPoint(p1);

p3 = p1 + (-p2);

return p3;

}

public static EllipticPoint Double(EllipticPoint p)

{

EllipticPoint p2 = new EllipticPoint();

p2.a = p.a;

p2.b = p.b;

p2.modulus = p.modulus;

BigInteger dy = 3 * Power(p.x, 2) + p.a;

BigInteger dx = 2 * p.y;

if (dx < 0)

dx += p.modulus;

if (dy < 0)

dy += p.modulus;

if (GCD(dx, p.modulus) != 1)

{

p2.a = GCD(dx, p.modulus);

p2.modulus = -1;

return p2;

}

BigInteger m = (dy * dx.modInverse(p.modulus)) % p.modulus;

p2.x = (Power(m, 2) - p.x - p.x) % p.modulus;

p2.y = (m * (p.x - p2.x) - p.y) % p.modulus;

if (p2.x < 0)

p2.x += p.modulus;

if (p2.y < 0)

p2.y += p.modulus;

return p2;

}

}

public static BigInteger Factor(EllipticPoint p, int n)

{

p.b = (Math.Power(p.y, 2) - (Math.Power(p.x, 3) + p.a * p.x)) % p.modulus;

if (p.b < 0)

p.b += p.modulus;

EllipticPoint p2 = EllipticPoint.Double(p);

if (p2.modulus == -1)

return p2.a;

for (int i = 1; i < n; i++)

{

p2 = p2 + p;

if (p2.modulus == -1)

return p2.a;

}

return -1;

}

Basically, go ahead and choose an elliptic curve and a point on that curve. You then feed this algorithm your number and voila, out pops the smallest factor of it. It is an... O(e^((1 + o(1))sqrt(2ln(p)ln(ln(p)))) algorithm where p is the smallest factor of n. This makes it sub-exponentional, but super-polynomial.

Edit:

For comparison-

QS is O(e^((1 + o(1))sqrt(ln(n)ln(ln(n)))))

GNFS is O(e^((1.92 + o(1))ln(n)^(1/3)ln(ln(n))^(2/3)))

More info... my C# implementation of this takes about 1.3 seconds to factor a 12-digit product of two 6-digit primes.

Posted 02 November 2004 - 05:37 PM

The following pseudocode is called the Brent variation of the Pollard Rho algorithm. For an explanation, I highly recommend chapter 5 of this book:

Bressoud, David M., Factorization and Primality Testing, Springer-Verlag, 1989.

This algorithm is very nice for factoring moderate-size numbers for which trial division would be prohibitively slow. It's a probabilistic algorithm, however, and could have a very long running time for some input numbers.

A typical generalized factoring program would first perform trial division by all of the primes less than, say, a million. (You'll probably want to have a table of these primes available as a data file that you've precomputed using the sieve of Eratosthenes.) Once trial division has removed all of the small factors, the program would move on to more powerful, but still relatively simple, algorithms such as Pollard Rho. If those fail to completely factor the input number, then you have to pull out the big guns like the Multiple Polynomial Quadratic Sieve.

Anyway, here's the Pollard Rho algorithm:

The input value of c needs to be an integer that makes the polynomial x^2 + c irreducible. Ordinarily, you just start with c=1 and count up for cases 1 and 2 below.

This algorithm will return one of three possible values:

1) 0 means that the maximum number of iterations ran without a factor being found. Try choosing another value of c.

2) The number n itself means that the gcd did not find a proper factor. Try choosing another value of c.

3) Any other number means you found a nontrivial factor of n.

-- Eric Lengyel

Bressoud, David M., Factorization and Primality Testing, Springer-Verlag, 1989.

This algorithm is very nice for factoring moderate-size numbers for which trial division would be prohibitively slow. It's a probabilistic algorithm, however, and could have a very long running time for some input numbers.

A typical generalized factoring program would first perform trial division by all of the primes less than, say, a million. (You'll probably want to have a table of these primes available as a data file that you've precomputed using the sieve of Eratosthenes.) Once trial division has removed all of the small factors, the program would move on to more powerful, but still relatively simple, algorithms such as Pollard Rho. If those fail to completely factor the input number, then you have to pull out the big guns like the Multiple Polynomial Quadratic Sieve.

Anyway, here's the Pollard Rho algorithm:

n = number to be factored

c = constant value (see comment below)

max = maximum iterations (depends on how long you're willing to wait)

check = how often to check gdc (something around every 10 iterations or so)

x1 = 2

x2 = 4 + c

range = 1

product = 1

terms = 0

for (int i = 0; i < max; i++)

{

for (int j = 0; j < range; j++)

{

x2 = (x2 * x2 + c) mod n

product *= (x1 - x2) mod n

if (++terms == check)

{

g = gdc(product, n)

if (g > 1) return (g)

product = 1

terms = 0

}

}

x1 = x2

range *= 2

for (int j = 0; j < range; j++)

{

x2 = (x2 * x2 + c) mod n

}

}

return (0)

The input value of c needs to be an integer that makes the polynomial x^2 + c irreducible. Ordinarily, you just start with c=1 and count up for cases 1 and 2 below.

This algorithm will return one of three possible values:

1) 0 means that the maximum number of iterations ran without a factor being found. Try choosing another value of c.

2) The number n itself means that the gcd did not find a proper factor. Try choosing another value of c.

3) Any other number means you found a nontrivial factor of n.

-- Eric Lengyel

Posted 02 November 2004 - 06:21 PM

Quote:

Original post by Eric Lengyel

The following pseudocode is called the Brent variation of the Pollard Rho algorithm. For an explanation, I highly recommend chapter 5 of this book:

Bressoud, David M., Factorization and Primality Testing, Springer-Verlag, 1989.

This algorithm is very nice for factoring moderate-size numbers for which trial division would be prohibitively slow. It's a probabilistic algorithm, however, and could have a very long running time for some input numbers.

A typical generalized factoring program would first perform trial division by all of the primes less than, say, a million. (You'll probably want to have a table of these primes available as a data file that you've precomputed using the sieve of Eratosthenes.) Once trial division has removed all of the small factors, the program would move on to more powerful, but still relatively simple, algorithms such as Pollard Rho. If those fail to completely factor the input number, then you have to pull out the big guns like the Multiple Polynomial Quadratic Sieve.

Anyway, here's the Pollard Rho algorithm:

*** Source Snippet Removed ***

The input value of c needs to be an integer that makes the polynomial x^2 + c irreducible. Ordinarily, you just start with c=1 and count up for cases 1 and 2 below.

This algorithm will return one of three possible values:

1) 0 means that the maximum number of iterations ran without a factor being found. Try choosing another value of c.

2) The number n itself means that the gcd did not find a proper factor. Try choosing another value of c.

3) Any other number means you found a nontrivial factor of n.

-- Eric Lengyel

!!!!!!

It workes!!!

I'm going to turn it into a function soon, and integrate it into my factorizor!

:) :) :) :) :) :)

[grin][grin][wink][grin]

jperalta: How exactly do you make an eliptic curve?

From,

Nice coder

Posted 03 November 2004 - 06:03 PM

Quote:

Maurer’s algorithm (Algorithm 4.62) generates random provable primes that are almost

uniformly distributed over the set of all primes of a specified size. The expected time for

generating a prime is only slightly greater than that for generating a probable prime of equal

size using Algorithm 4.44 with security parameter t = 1. (In practice, one may wish to

choose t > 1 in Algorithm 4.44; cf. Note 4.49.)

If you're interested in random generation of