From charlesreid1

Prime number sieve


The basic concept behind the Sieve of Eratosthenes is this: create an array of N boolean switches, with switch k indicating that an integer k is prime (on) or composite (off). Starting at switch k, move k switches down, and if it is on, flip it to off (the number is composite because it is divisible by k). In the end, all the switches that are left on are prime numbers. Because of the symmetry of multiplication and factoring, k ranges from 2 to the square root of N.

Various improvements can be made (e.g., only storing odd numbers), with some improvements leading to surprising performance improvements.

Algorithm Analysis

See also: Project Euler/501

The Sieve of Eratosthenes requires O(N) space, so it can be problematic for finding very large prime numbers, larger than a trillion, beyond the working memory of a computer.

In this case we can utilize the segmented Sieve of Eratosthenes algorithm, which generates prime numbers between two numbers m and n.

The savings from implementing a segmented Sieve of Eratosthenes method come from the fact that the amount of space required is only as large as the square root of N, the end point, and that is a substantially smaller storage requirement.

This means M must be larger than the square root of N, however, otherwise we have overlap.



See Project Euler/501

Segmented Sieve

For a more mathy explanation, see Segmented Sieve.

A words-only explanation:

The segmented sieve algorithm is structured similarly to the regular sieve - each time, we start with a list of prime numbers up to sqrt(N); then we make multiple passes through the array and eliminate numbers when they are composite.

What's different about the segmented sieve is that we already start at a number M that has some integers as factors, and some integers that are not factors. This changes where we start counting to mark integers as composite.

To illustrate: imagine we are using a segmented sieve to find prime numbers between 1 billion and 2 billion. The segmented sieve might start at 1,000,000,000 or at 1,000,000,001 - and that will change where we should start counting when we consider k = 2 and eliminate all even numbers. (In reality, we would only be considering odd numbers, but this is just an illustrative example.)

This offset is a special value called a Q value that indicates, for each prime, how far into our sequence (starting at the start index) we need to go to get to a number that that prime divides.

Let's look at a more detailed example.

If we are performing the segmented Sieve of Eratosthenes on the interval 100 to 200, we start at 101 (considering odd numbers only), then 103, then 105 - and 105 is divisible by 3. So the Q value for the prime 3 is the number of steps we took to get to that multiple - or, 2.

Same thing for 5, since 105 is also a multiple of 5, we get a Q value of 2.

Likewise with 7, since 105 is also a multiple of 7, we get a Q value of 2.

For the next prime, 11, the number 121 is a multiple of 11, and 121 is (121-101)/2 = 10 elements down, so we get a Q value of 10.

And so on.

Now, each time we are looking at a segment, we start at that Q value, and we jump forward p units, and eliminate the number p units ahead as a potential factor. For 3, we started with a Q value of 2, so we start at 105. Now we jump ahead 3 units, to 111 (remember we are jumping in intervals of 2, since we are not considering the even numbers). 111 is also divisible by 3, and we can also eliminate it from our list of possible primes.

We are only performing this operation over a relatively short interval - then we collect the results and move on. In the result above, going from 100 to 200, we might perform the above operations in intervals of 10 odd numbers each (covering 20 total numbers, which means we will take 5 steps to cover the whole range of 100 to 200). If we were doing a really big range, like 2 trillion to 3 trillion, we might take steps in intervals of a million.


Python implementation of segmented sieve of eratosthenes:

Programming Praxis description of algorithm:

More details and coverage:

Original reference: