Sign in to follow this  

How can I make this use less memory? [Sieve of Eratosthenes]

This topic is 4531 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

#include <iostream>
#include <iomanip>
#include <set>
void Sieve(std::set<unsigned int>& s ,unsigned int n)
{
	unsigned int m,i;
	s.erase(s.begin(),s.end());
    
	std::cout << "Filling set: " << 1 << " to " << n << std::endl << std::endl;
	s.insert(1);
	s.insert(2);
	for(m=3; m<=n; m+=2)
		s.insert(m);
	
	std::cout << "Set Filled, " /*<< s.size() << " Items."<< std::endl*/ ;
	std::cout << "Removing non-primes:" << std::endl << std::endl;
	
	for(m=3; m*m <= n; m++)
	{
		if(s.find(m) != s.end())
		{
			i = 2 * m;
			while (i <= n)
			{
				s.erase(i);
				i += m;
			}
		}
	}
	std::cout << "Non-primes removed: " << s.size() << " items remaining" << std::endl;

}

int main()
{
	unsigned int count = 0;
	std::set<unsigned int> primes; 
	std::set<unsigned int>::iterator itr;
	Sieve(primes,35000000);

	std::cin >> count;
	itr = primes.begin();
	if(!count)
	{
		while(itr != primes.end())
		{
			count++;
			std::cout << " " << *itr << " ";
			if(count % 10 == 0)
				std::cout << std::endl;
			itr++;
		}
		std::cout << std::endl;

		std::cin >> count;
	}
	return 0;
}


The Sieve of Eratosthenes is used to calculate prime numbers in a given set of integers. The above program works great except for one thing. The problem with this is that I can only calculate about 35 million before it uses up my entire memory, about 1GB, at which point it starts using the swap file and the program slows to a snails pace while my hard drive thrashes around. With 35 million it takes about a minute to complete but with 40 million I let it run several hours before I killed it. I wanted to calculate all primes in the unsigned int range 0 to 0xFFFFFFFF which is 4 billion ints. So I need to make this use a lot less memory.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
how about just using one bit per number in a static array, i'm not sure if that is actually smaller as you have to store 1 bit per non-prime as well. You can for sure save some memory by using a mark and sweep approach on a static array of unsigned ints.

Share this post


Link to post
Share on other sites
Instead of std::set, I'd use array of bits. One for each odd number bigger than 1. First set all the bits, then go through the array, and for each bit set, you clear all the bits further in the array kind of like this:

for(int i=0;i<N;i++)
if (bitset[i])
for(int k=i+i;k<N;k+=i)
bitset[i]=0;


If you use one bit per odd integer, you'd need 2^28 bytes = 256 Megs to find all primes less than 2^32.

AP: It would take less memory, as he's initializing his std::set with all the odd integers, taking 2^31*4 bytes = 16 Gigs for all up to 2^32, plus the overhead from std::set, which is in some proportion to the amount of elements in it.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
how about just using one bit per number in a static array, i'm not sure if that is actually smaller as you have to store 1 bit per non-prime as well. You can for sure save some memory by using a mark and sweep approach on a static array of unsigned ints.


That should use ~512MiB (e.g. 1/8th of 4GiB) to cover 0..0xFFFFFFFF, so that would indeed come out to a smaller amount.

Share this post


Link to post
Share on other sites
Firstly, do not populate the set first. If a number isn't prime, the sieve doesn't need to remember it. Instead, generate your set at runtime using a for loop and test each number as it comes but do not store non-primes in the memory for more than one iteration.

Also, on a mathematical point, the higher up in the set of integers you go, it becomes exponentially less likely that each number is prime. If I understand your code correctly, you're using less memory as time goes on, since you populate the set at the start and then remove the non-prime elements. The problem is that primes become incredibly sparse when you enter the millions, so getting from 35 million primes to 40 million is going to take a very long time.

Share this post


Link to post
Share on other sites
Quote:
Original post by Grain
How to I make an array of bits? There is no bit type as far as I know.


There's bound to be a bitvector class somewhere (probably in boost), but if you can't find one, here's the basics:


class mybits
{
std::vector<unsigned char> vec;

public:
mybits(int bits) { vec.resize(bits/8); };

set(int bit,bool on=true) {
int b=bit%8;
int p=bit/8;
int mask=1<<b;
vec[p]=(vec[p]&(~mask))|(on?mask:0);
}
clear(int bit) { set(bit,false); };
}



I won't guarantee that that'll work (since I just typed it from the top of my head), but the idea should be clear enough.

Share this post


Link to post
Share on other sites
You'll have to do it manually, probably create a custom container class. Check out the bitwise logic and shift operators. You basically want to have a bunch of 32bit integers, shift the required bit into place and "and" it with 1.

Here is an interesting concept which you could use: Ranges (from here).


One bit for each possible odd number of a 32 bit integer is 256 megabytes of data. It should fit nicely [smile].

Share this post


Link to post
Share on other sites
Just for fun, here's a class. This "should" work [wink]. It will automatically return "not-set" for even numbers. Uses a puny 256 megabytes [smile]. You should probably test it before use, though.

class oddbits
{
private:
// enough space for every odd unsigned int in 32 bits
static const size_t SIZE = 0x4000000;
uint32 d[SIZE];

public:
oddbits() { for(size_t i=0; i<SIZE; i++) d[i]=0; }

bool get(unsigned int i)
{
return (i&1==1) && ((d[i>>6]>>(i>>1))&1==1);
}

void set(unsigned int i, bool v)
{
if(i&1==0) return;

uint32 x = (1<<((i>>1)&1F));
d[i>>6] = (d[i>>6]&~x)|(v?x:0);
}
};

Share this post


Link to post
Share on other sites
Quote:
Original post by Motz
Firstly, do not populate the set first. If a number isn't prime, the sieve doesn't need to remember it. Instead, generate your set at runtime using a for loop and test each number as it comes but do not store non-primes in the memory for more than one iteration.

Also, on a mathematical point, the higher up in the set of integers you go, it becomes exponentially less likely that each number is prime. If I understand your code correctly, you're using less memory as time goes on, since you populate the set at the start and then remove the non-prime elements. The problem is that primes become incredibly sparse when you enter the millions, so getting from 35 million primes to 40 million is going to take a very long time.


Quoted for value. Instead of storing all the numbers and then removing the composite values, you can consider each number for prime-ness as you get to it, and insert it. Since you will then not have to remove values from the middle, you can use a more space-efficient container as well: std::vector.


#include <iostream>
#include <iomanip>
#include <vector>

void Sieve(std::vector<unsigned int>& s ,unsigned int n) {
unsigned int m,i;
s.erase(s.begin(),s.end());
s.push_back(1);
s.push_back(2);
for(m=3; m<=n; m+=2) {
int count = s.size();
bool prime = true;
for (int i = 0; i < count; ++i) {
if (!(m % s[i])) { // current value is divisible by a former prime;
// discard it
prime = false; break;
// In case you're wondering, solving this with a goto is just as tricky;
// you'd need to put it at the *end* of the outer loop, with no
// action following it, which is rather bizarre.
}
}
if (prime) s.push_back(m);
}
}



Of course, this isn't the Sieve of Eratosthenes any more (pedantically speaking anyway), but it's how it's normally done :s

Share this post


Link to post
Share on other sites
If you just want to make heaps and heaps of primes theres a pretty easy way to do it.

Make a list of small primes up to the suqare root of the number you want the primes to. (so if you want primes to 2^32, grab them to 2^16.)

Load up an array, call it 'counters'.
Set primes up the same way.
In counters shove your primes.
Set powarr as the two powers from 1 to 32


For i = 0 to n / 32
Testbyte = ~0

For p = 0 to number_of_Primes_in_list
if counters[p] /32 = i then 'You can do something to speed this up, like precomputing it in another table, but thats another story.
Testbyte &= ~powarr[counters[p] % i] //Get rid of that byte
counters[p] = counters[p] + primes[p] 'Move the counters to the next one
end if

If testbyte = 0 then goto next_loop 'If there are no more possible primes,
' then why bother testing any more? I know goto's are evil, but this is the only
' way i can do it

Next p

For tmp = 1 to 32 'Check the bits on the tmpbyte to see if we have anything.

if testbyte & powarr[tmp] then
print i * 32 + tmp " Is a PRIME!!!"
end if

next tmp

next_loop:

Next i



This code should work (well, its psudocode), but i'm not making any guarentees.

Simply, its letting you sieve a bit slower, but with o(sqr(n)) memory. (which will help heaps if you ever want to find every prime to 2^64).
You might be able to speed it up quite a bit some some 'upgrades'. (things like storing counters[p] /32 when you make it.)

Hth.
From,
Nice coder

Share this post


Link to post
Share on other sites
[quote]Original post by Zahlman


for (int i = 0; i < count; ++i) {
if (!(m % s[i])) { // current value is divisible by a former prime;
// discard it
prime = false; break;
// In case you're wondering, solving this with a goto is just as tricky;
// you'd need to put it at the *end* of the outer loop, with no
// action following it, which is rather bizarre.
}
}



[/quote]

I don’t see how that will work. You only compare the current number to be considered with the existing primes. Won’t you need to compare it with ALL the numbers from 2 to half it self to see if it’s prime, and not just the known primes smaller than it self?

Share this post


Link to post
Share on other sites
No, you don't need to. All of the non-primes smaller than N are made up of the primes in that set. For example, 24 is not a prime, it's quite obviously divisible by 6. As such, it must also be divisible by 3 and 2, so why make more work for yourself by checking it against 6 once you know that it divides by 2 and 3? By the same token, 23 is a prime number. 10 is just under half of 23. However, 10 cleanly divides by 5 and 2. If 23 doesn't divide by 5 or 2, then it can't be divisible by 10, so why check it?

There's a mathematical law (can't remember which) that states that every number N is the product of two primes, p and p'. Thus, if a number is not divisible by any primes, then it won't be divisible by non-primes either, since they themselves are the products of smaller primes.

Share this post


Link to post
Share on other sites
Quote:
Original post by Motz
No, you don't need to. All of the non-primes smaller than N are made up of the primes in that set. For example, 24 is not a prime, it's quite obviously divisible by 6. As such, it must also be divisible by 3 and 2, so why make more work for yourself by checking it against 6 once you know that it divides by 2 and 3? By the same token, 23 is a prime number. 10 is just under half of 23. However, 10 cleanly divides by 5 and 2. If 23 doesn't divide by 5 or 2, then it can't be divisible by 10, so why check it?

There's a mathematical law (can't remember which) that states that every number N is the product of two primes, p and p'. Thus, if a number is not divisible by any primes, then it won't be divisible by non-primes either, since they themselves are the products of smaller primes.


0k. but for some reason that method painfully slow. I just copied and pasted Zahlmans code and compared it side by side with the bit array method, and the difference is ridiculous. That method takes about a minute and a half for just 1 million ints. As where the actual Sieve of Eratosthenes using bit aray takes less than a second.

Just for reference here is the entire program as it is now

#include <iostream>
#include <iomanip>
#include <vector>
#include "bits.h"

void Sieve(bits& s ,unsigned int n)
{
unsigned int m,i;

std::cout << "Filling set: " << 1 << " to " << n << std::endl << std::endl;

s.resize(n);
s.set(); // set(), clear() with no argument sets/clears all elements in the bit object;
s.clear(0);
s.clear(1);

std::cout << "Set Filled, " /*<< s.size() << " Items."<< std::endl /**/;
std::cout << "Removing non-primes:" << std::endl << std::endl;

m = 0;

for(i=0; i < n; i++)
if(s[i])
{
m++;
for(int k = (i+i); k < n; k+=i)
s.clear(k);
}

std::cout << "Non-primes removed: " << m << " items remaining" << std::endl;
std::cout << "Primes ratio: " << (float)m/(float)n << std::endl;

}
void Sieve(std::vector<unsigned int>& s ,unsigned int n) {
unsigned int m,i;
std::cout << "Calculating primes: " << std::endl << std::endl;
s.erase(s.begin(),s.end());
//s.push_back(1);
s.push_back(2);
for(m=3; m<=n; m+=2) {
int count = s.size();
bool prime = true;
for (int i = 0; i < count; ++i) {
if (!(m % s[i])) { // current value is divisible by a former prime;
// discard it
prime = false; break;
// In case you're wondering, solving this with a goto is just as tricky;
// you'd need to put it at the *end* of the outer loop, with no
// action following it, which is rather bizarre.
}
}
if (prime) s.push_back(m);
}
std::cout << s.size() << " primes calculated." << std::endl;
}

int main()
{

unsigned int count = 0;
unsigned int N = 1000000;
bits Bit(N);
std::vector<unsigned int> primes;
std::vector<unsigned int>::iterator itr;

Sieve(Bit,N);
Sieve(primes,N);
}









//FILE: bits.h
#include <vector>
class bits
{
std::vector<unsigned char> B;
public:
bits(unsigned int size)
{
B.resize(1 + size/8);
}
void resize(unsigned int size)
{
B.resize(1 + size/8);
}
unsigned int size()
{
return B.size()*8;
}
void set(unsigned int bit)
{
unsigned int p = bit/8;
unsigned int b = bit%8;
unsigned char m = 1 << b;
B[p] = B[p] | m;
}
void set()
{
for(unsigned int i = 0; i < this->size(); i++)
set(i);
}
void clear(unsigned int bit)
{
unsigned int p = bit/8;
unsigned int b = bit%8;
unsigned char m = 1 << b;
m = ~m;
B[p] = B[p] & m;
}
void clear()
{
for(unsigned int i = 0; i < this->size(); i++)
clear(i);
}

bool operator [](unsigned int bit)
{
unsigned int p = bit/8;
unsigned int b = bit%8;
unsigned char m = 1 << b;
m = m & B[p];
m = m >> b;
return m;
}
};



EDIT: Using the Bit array is insanely fast. Way faster than my original set method. I just rechecked my original size of 35 million ints which took about 1 minuet using std::set. But with the bit array it took only a couple seconds!!

[Edited by - Grain on July 4, 2005 8:17:39 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Motz
There's a mathematical law (can't remember which) that states that every number N is the product of two primes, p and p'.


Thats wrong, just take 24 which is 2*3*4, which are more than two primes.
The theorem you remebered may be this one:
http://mathworld.wolfram.com/FundamentalTheoremofArithmetic.html


To Grain:
Are you looking to find ALL the primes, or are you just interested if a number is prime ?
For instance: (from here)
With the exception of 2 and 3, all primes are of the form p=6n+1 or p=6n-1.

And there are other tests for primes.

For the case of the Sieve, you may think about implementing your own virtual memory manager. Since on a 32 bit machine, your address space is as small as 2 or 3 GB (you need some space for the OS), you may want to manage the primes found in a big file on the harddisk. Then you should group your tests in blocks, i.e. 512byte/sector times 8primes/byte are 4096primes/sector. So you may test many new numbers against all 4096 primes of a sector, before you advance to the next sector. This way you can virtually increase your address space to many gigabytes. Also the OS will cache accesses, you only need to make sure that you work on the same data as long as possible.

Share this post


Link to post
Share on other sites
Also remember that all primes above 2 end in 1,3,7 or 9.

So, instead of populating your array with loads of numbers, just populate them with those that finish in these digits.

Share this post


Link to post
Share on other sites
Quote:
Original post by Grain
0k. but for some reason that method painfully slow. I just copied and pasted Zahlmans code and compared it side by side with the bit array method, and the difference is ridiculous. That method takes about a minute and a half for just 1 million ints. As where the actual Sieve of Eratosthenes using bit aray takes less than a second.

Just for reference here is the entire program as it is now
*** Source Snippet Removed ***

*** Source Snippet Removed ***
EDIT: Using the Bit array is insanely fast. Way faster than my original set method. I just rechecked my original size of 35 million ints which took about 1 minuet using std::set. But with the bit array it took only a couple seconds!!


Try compiling with optimisations and bound your tests in the second version and you'll find they're much closer (although the bit array is still faster):
void Sieve(std::vector<unsigned int>& s ,unsigned int n) {
unsigned int m;
std::cout << "Calculating primes: " << std::endl << std::endl;
s.erase(s.begin(),s.end());
//s.push_back(1);
s.push_back(2);
int root = 2;
for(m=3; m<=n; m+=2) {
int count = s.size();
bool prime = true;
while (root * root < m)
{
++root;
}
for (int i = 0; i < count && s[i] <= root; ++i) {
if (!(m % s[i])) { // current value is divisible by a former prime;
// discard it
prime = false; break;
// In case you're wondering, solving this with a goto is just as tricky;
// you'd need to put it at the *end* of the outer loop, with no
// action following it, which is rather bizarre.
}
}
if (prime) s.push_back(m);
}
std::cout << s.size() << " primes calculated." << std::endl;
}



Enigma

[Edited by - Enigma on July 4, 2005 9:05:20 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by nmi
Quote:
Original post by Motz
There's a mathematical law (can't remember which) that states that every number N is the product of two primes, p and p'.


Thats wrong, just take 24 which is 2*3*4, which are more than two primes.
The theorem you remebered may be this one:
http://mathworld.wolfram.com/FundamentalTheoremofArithmetic.html


4 isn't a prime, it's the product of 2 and 2. But I was wrong anyway, just checked back on a simulation I did a while ago ;) Sorry about that. Anyhow, the point it supported still stands, since every non-prime is divisable by 1 or more primes and thus you needn't try dividing by non-primes.

Share this post


Link to post
Share on other sites
Quote:
Original post by Prozak
Also remember that all primes above 2 end in 1,3,7 or 9.

So, instead of populating your array with loads of numbers, just populate them with those that finish in these digits.


Think of it in binary.
All primes but 2 end with the last bit set to 1.

Share this post


Link to post
Share on other sites
Quote:
Original post by Motz
Quote:
Original post by nmi
Quote:
Original post by Motz
There's a mathematical law (can't remember which) that states that every number N is the product of two primes, p and p'.


Thats wrong, just take 24 which is 2*3*4, which are more than two primes.
The theorem you remebered may be this one:
http://mathworld.wolfram.com/FundamentalTheoremofArithmetic.html


4 isn't a prime, it's the product of 2 and 2. But I was wrong anyway, just checked back on a simulation I did a while ago ;) Sorry about that. Anyhow, the point it supported still stands, since every non-prime is divisable by 1 or more primes and thus you needn't try dividing by non-primes.


Sorry for that, i wrote too fast ;-)
Should have written 2*3*5=30. Anyways, a number may have more than only two prime factors.

For the thing that you need not have to divide by non primes, thats correct. But isnt the sieve working like that anyways ? If a prime is found, all multiples of that prime are marked as non-prime.
Or the other way around, if you check a number N=n*n for being prime, you check the prime numbers from 2 to n, if they can divide that number N. If they don't divide it without remainder, then N is a prime.

Share this post


Link to post
Share on other sites
Quote:
Original post by nmi
Quote:
Original post by Prozak
Also remember that all primes above 2 end in 1,3,7 or 9.

So, instead of populating your array with loads of numbers, just populate them with those that finish in these digits.


Think of it in binary.
All primes but 2 end with the last bit set to 1.


Ah, but of course, those that end in '5' are not primary [smile] So if you just take those with the last bit set, you'll end up having to prune out the '5's later anyway, so why not just do it to begin with?

Share this post


Link to post
Share on other sites
Quote:
Original post by Enigma
Quote:
Original post by Grain
0k. but for some reason that method painfully slow. I just copied and pasted Zahlmans code and compared it side by side with the bit array method, and the difference is ridiculous. That method takes about a minute and a half for just 1 million ints. As where the actual Sieve of Eratosthenes using bit aray takes less than a second.

Just for reference here is the entire program as it is now
*** Source Snippet Removed ***

*** Source Snippet Removed ***
EDIT: Using the Bit array is insanely fast. Way faster than my original set method. I just rechecked my original size of 35 million ints which took about 1 minuet using std::set. But with the bit array it took only a couple seconds!!


Try compiling with optimisations and bound your tests in the second version and you'll find they're much closer (although the bit array is still faster):
*** Source Snippet Removed ***

Enigma

Which particular optimizations do you mean? And that change of code did dramatically speed it up, but it is still no where near as fast as the bit array.

Share this post


Link to post
Share on other sites
"with optimizations" == "in release mode" here, I believe. :/

Another improvement:


// OLD
// while (root * root < m)
// {
// ++root;
// }
// for (int i = 0; i < count && s[i] <= root; ++i) {
// good
// we can't reach the end of the prime list before hitting the square root limit
// (AFAIK?), so that check is redundant. Also, instead of squaring every number
// up to our root to find the limit, we should just be squaring every prime:
// NEW
for (int i = 0; (s[i] * s[i]) < m; ++i) {

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
Quote:
Original post by Grain
How to I make an array of bits? There is no bit type as far as I know.


There's bound to be a bitvector class somewhere (probably in boost)

I've found that std::bitset is quite easy to work with. Its part of the Standard Library as well.

Share this post


Link to post
Share on other sites
This is a good read for dealing with primes, complete with source code.

http://www.troubleshooters.com/codecorn/primenumbers/primenumbers.htm

The author uses paging to disk to get rid of the memory barrier.


Chapter lists...:

1. Introduction
2. The Obvious Method for Finding Primes
3. Using Only Primes As Divisors
4. Odd Candidates Only
5. The Array Method (Sieve of Eratosthenes)
6. The Bit Array Improvement
7. Design Musings: Scalability Through Paging
8. Implementation: Scalability Through Paging
9. Design Musings: The Cluster Solution
10.Conclusion

-Chris

Share this post


Link to post
Share on other sites

This topic is 4531 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this