From charlesreid1

This page describes a segmented prime number sieve (i.e., a sieve that can look for prime numbers over an arbitrary interval [M,N]). This is related to Project Euler/501. Also see Sieve of Eratosthenes.

Theory

Segmented Prime Sieve

The idea behind a segmented prime number sieve is to search for prime numbers over an arbitrary range (M,N). To properly implement the segmented sieve, we restrict M > \sqrt{N}, although we generate primes less than \sqrt{N} in the process so these can be saved out if this is not the case.

The algorithm is straightforward, but for one component. What we do is, we split the interval [M,N] into sub-segments of size S, so that we search for prime numbers on the intervals [M, M+S-1], [M+S, M+2S-1], \dots, [N-S, N]. We then generate a list of prime numbers between 2 and \sqrt{N} using a regular Sieve of Eratosthenes algorithm. Once we have that list, we can iterate through each sub-segment, and for each sub-segment we eliminate any numbers that happens to be a multiple of that prime, using a bit vector technique similar to the Sieve of Eratosthenes. For example, if we are running a segmented sieve for the interval [1000, 1100], we have:


\sqrt{1100} \approx 33.166

So we run a Sieve of Eratosthenes to find all primes up to the nearest floored integer, 33. Now, for this list of numbers, we want to eliminate any numbers between [1000,1100] that are multiples of 2, then any that are multiples of 3, then any that are multiples of 5, and so on.

Let's keep things simple and consider a single sub-segment of size 100 that covers the entire segment. Then after crossing out multiples of 2, 3, and 5, we have:

X 1000
  1001
X 1002
  1003
X 1004
X 1005
X 1006
  1007
X 1008
  1009
X 1010
X 1011
X 1012
  1013
X 1014
X 1015
X 1016

...

After running through this process with each prime in the range [2,\sqrt{N}] (optionally stopping when you reach \sqrt{M+S} or the upper boundary of the segment, if the segment is particularly large), the primes can be collected.

The Subtlety: Starting Points

The algorithm described above is great, but for one thing: how can we determine, quickly, where to begin crossing out factors?

In other words, let's suppose we're on the step where we want to cross out factors of 7 in our sub-segment. We want to start at some multiple of 7 in the bit vector; then we walk 7 elements and cross out the number at that position; and we repeat that process until we get to the end of the sub-segment.

But finding the start position takes care. If we start at the first number 1,000, we eliminate 1000, 1007, 1014, \dots, which are not multiples of 7. In fact, we need to start at 1,001, which is the first multiple of 7 in our sub-segment: 7 \times 143 = 1,001.

To generalize this, we can use a formula which forms the core of the segmented sieve algorithm, which I found at Pragmatic Programming and verified with The Segmented Sieve of Eratosthenes and Primes in Arithmetic Progressions to 10^12.

We have a set of primes P and are considering integers in a sub-segment, which we will call S. Then we wish to define a vector Q that is the length of the set of primes, and consists of the index at which to begin crossing out factors (that is, it contains the index of the first multiple of that prime).

For example, if we are sieving prime numbers from 100 to 122, then the nearest prime to \sqrt{122} is 13. We use the Sieve of Eratosthenes to generate the set of primes up to 13 (the practical implementation of this will exclude 2 and even numbers from the sieve, so we do that here too):

P = \{3, 5, 7, 11, 13\}

The segment we are considering is:


S = \{101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121\}

The associated Q vector for the set S is:


Q = \{3, 3, 3, 11, 9\}

as the first factor of 3 is element 3 (105 = 3 * 5 * 7), the first factor of 5 is element 3 (105), the first factor of 7 is element 3 (105), the first factor of 11 is element 11 (121), and the first factor of 13 is element 9 (117 = 13 * 9).

To obtain this Q vector, we can use the following expression for the Q vector value for the kth prime:


Q_k = - \dfrac{1}{2} \left( L+1+p_{k} \right) \mod p_k

where L is the left-hand side of the interval - the interval starting point. This can then be reset at the next interval using the formula:


Q_k = (Q_k - B) \mod p_k

where B denotes the beginning of your block or sub-segment.

When these formulas are implemented, care must be taken if your language does not handle negative numbers modulo some base (as with Java). If that is the case, you can always multiply by multiples of p, the modular base, to get to a positive number, as in the implementation below:

	protected static int qInit(long left, int p) {
		// first term here is because modOfThis must be positive.
		long modOfThis = (left/2)*p - (left+1+p)/2;
		int result = (int)(modOfThis%((long)p));
		return result;
	}

Likewise, here is the Q vector reset function:

	protected static int qReset(int p, int q, int delta) {
		int result = (q-delta)%p;
		int j = 1;
		while(result < 0) { 
			result += j*p;
		}
		return result;
	}

Code

PrimeGenerator Class

The following class implements the segmented Sieve of Eratosthenes, together with the segmented Sieve of Eratosthenes, to create a PrimeGenerator object that continually generates new prime numbers:

import java.math.BigDecimal;
import java.util.List;
import java.util.Queue;
import java.util.LinkedList;
import java.util.ArrayList;

/** Prime number generator class 
 * implemented using Longs.
 *
 * Uses a segmented sieve of Eratosthenes
 * algorithm.
 */
public class PrimeGeneratorLong { 

	/**
	 * Prime Generator class implementation.
	 */

	private Queue<Long> q; // The queue stores the prime numbers this generator has found
	private long max;
	private long lo, hi;
	private int rootmax;

	private int cycle;

	public PrimeGeneratorLong(long max) {
		this.max = max;
		this.rootmax = (int)(Math.sqrt(this.max));

		// Making rootmax even makes life easier.
		if(rootmax%2!=0) {
			rootmax++;
		}

		this.q = new LinkedList<Long>();

		this.lo = 1;
		this.hi = 1;

		// Look ahead. This enables hasNext() check.
		repopulateQueue();
	}

	public void reset() {
		this.q = new LinkedList<Long>();
		this.lo = 1;
		this.hi = 1;
		repopulateQueue();
	}

	public Long next() { 
		Long result = q.remove();

		// Look ahead. This enables hasNext() check.
		if(q.isEmpty()) {
			repopulateQueue();
		}

		return result;
	}



	/** Repopulate queue with prime number starting at lo and going to hi. */
	public void repopulateQueue() { 

		int ACCORDION = (int)(Math.round(Math.sqrt(rootmax)));

		// Value of ACCORDION*sqrt(MAX) is the stride 
		// of our segmented Sieve of Eratosthenes algorithm.
		if(lo==1) {
			this.cycle = 1;
			// First time through,
			// use normal Sieve of Eratosthenes.

			// Add everything up to sqrt(MAX).
			// This ensures we don't call segmented Sieve of Eratosthenes 
			// when low < sqrt(hi), because we already begin 
			// with low above sqrt(MAX).
			q.addAll( primeSieve(rootmax) );

			// set new lo and hi.
			// lo must be even. 
			// only need to do this once.
			lo = rootmax;
			hi = rootmax + ACCORDION*rootmax;

		} else {
			this.cycle++;

			// Every other time through,
			// use segmented Sieve of Eratosthenes.
			q.addAll( primesRange(lo, hi) );
			if(q.isEmpty()) {
				// This is only a problem with small integers.
				// But it's still a problem.
				hi *= 2;
				q.addAll( primesRange(lo, hi) );
			}

			lo = hi;
			hi = hi + ACCORDION*rootmax;
		}
	}




	/** Find all prime numbers between left and right as longs. 
	 * */
	public static List<Long> primesRange(long left, long right) { 
		if(left%2!=0 || right%2!=0) { 
			throw new IllegalArgumentException("Error calling primesRange(): left and right must be even.");
		}

		// Return this
		ArrayList<Long> result = new ArrayList<Long>();

		// Ensure that we get precisely the correct range of primes.
		int delta;
		delta = (int)(right-left);
		delta /= 2;
		delta += 1;

		boolean[] sieve = new boolean[delta-1];

		// could make these longs... if sqrt(right) < 10^18, which, obviously.
		// could make these ints... if sqrt(right) < 10^9, which... also has to be.
		// these compromises are needed to keep the class working.
		// 
		// ...let future self deal with those and hack away at this
		// when we actually get there. For now, just get this done.
		//
		List<Integer> Ps = primeSieveInt( (int)(Math.sqrt(right)) );

		// Skip 2
		Ps.remove(Ps.indexOf(2));

		// Create Qs starting index vector
		List<Integer> Qs = new ArrayList<Integer>();
		for(int i=0; i<Ps.size(); i++ ) {
			Qs.add( qInit(left, Ps.get(i)) );
		}

		// Loop over each cluster
		while(left<right){

			// Assume everybody prime
			for(int k=0; k<(delta-1); k++) { 
				sieve[k] = true;
			}

			// Cross out multiple of each prime
			for(int i=0; i<Ps.size(); i++) {
				int p = Ps.get(i);
				int q = Qs.get(i);

				for(int j=q; j<(delta-1); j+=p) { 
					sieve[j] = false;
				}
			}

			// Reset for next cluster
			for(int k=0; k<Ps.size(); k++ ) {
				Qs.set(k, qReset(Ps.get(k), Qs.get(k), delta) );
			}

			// Have knocked out non-primes, so gather the primes
			long t = left + 1;
			for(int i=0; i<delta-1; i++) { 
				// t ends at right
				if(sieve[i]) { 
					result.add( new Long(t) );
				}
				t += 2;
			}
			left += 2*delta; 
		}
		return result;
	}

	protected static int qInit(long left, int p) {
		// modOfThis must be positive.
		//long modOfThis = (((left-1)*(p-1))/2)-1;
		long modOfThis = (left/2)*p - (left+1+p)/2;
		int result = (int)(modOfThis%((long)p));
		return result;
	}


	protected static int qReset(int p, int q, int delta) {
		int result = (q-delta)%p;
		int j = 1;
		while(result < 0) { 
			result += j*p;
		}
		return result;
	}




	/** Find all prime numbers below n.
	 *
	 * Return them as Longs.
	 */
	public static List<Long> primeSieve(int n) { 

        // initially assume all integers are prime 
        boolean[] isPrime = new boolean[n+1];
        for (int i = 2; i <= n; i++) {
            isPrime[i] = true;
        }

        // mark non-primes <= n using Sieve of Eratosthenes
        for (int factor = 2; factor*factor <= n; factor++) {

            // if factor is prime, then mark multiples of factor as nonprime
            // suffices to consider mutiples factor, factor+1, ...,  n/factor
            if (isPrime[factor]) {
                for (int j = factor; factor*j <= n; j++) {
                    isPrime[factor*j] = false;
                }
            }
        }

		// Add primes to list
		ArrayList<Long> primes = new ArrayList<>();
		int nprimes = 0;
		for(int k=2; k<=n; k++) {
			if(isPrime[k]) { 
				primes.add( new Long(k) );
				nprimes++;
			}
		}

		return primes;
	}



	/** Find all prime numbers below n.
	 *
	 * Return then as Integers.
	 */
	public static List<Integer> primeSieveInt(int n) { 

        // initially assume all integers are prime 
        boolean[] isPrime = new boolean[n+1];
        for (int i = 2; i <= n; i++) {
            isPrime[i] = true;
        }

        // mark non-primes <= n using Sieve of Eratosthenes
        for (int factor = 2; factor*factor <= n; factor++) {

            // if factor is prime, then mark multiples of factor as nonprime
            // suffices to consider mutiples factor, factor+1, ...,  n/factor
            if (isPrime[factor]) {
                for (int j = factor; factor*j <= n; j++) {
                    isPrime[factor*j] = false;
                }
            }
        }

		// Add primes to list
		ArrayList<Integer> primes = new ArrayList<>();
		int nprimes = 0;
		for(int k=2; k<=n; k++) {
			if(isPrime[k]) { 
				primes.add( new Integer(k) );
				nprimes++;
			}
		}

		return primes;
	}




	/** The prime generator is agnostic to the maximum;
	 * we use hasNext to determine when to stop. 
	 * Change this to true for an infinite prime generator.
	 * */
	public boolean hasNext() { 
		// This is possible because we always ensure there is something in the queue
		// before we finish operations on the queue.
		if(q.peek().compareTo(max)>0){ 
			return false;
		} else {
			return true;
		}
	}

}


Flags