Because I can, this post is about finding prime numbers with the sieve of Eratosthenes.
In Python 3 code, it looks like this:
def eratos(end):
"finds primes by Eratosthenes' sieve" # from Orthallelous
end += 1; m = set(range(3, end, 2)); d = m.difference_update
for p in range(3, int(end ** .5) + 1, 2):
if p in m: d(range(p * p, end, p + p))
return ([2] + sorted(m)) if end > 2 else []
That’s it, that’s the whole self-contained thing.
But wait, there’s more! How does it work?!
There’s just a set, a for loop and an if statement! Eratosthenes’ sieve works like so: Take all the digits between 2 and some upper limit (or end), and toss them into a pile. Start at the first digit, 2. Keep that one and then throw (sieve) out all the multiples of 2 in your pile. So toss out 4, 6, 8, 10, 12… up to the end. Now go onto the next biggest digit – which is 3. Keep the 3, then toss out all multiples of 3: 6, 9, 12, 15, etc. Now head to the next biggest digit in the pile – it’s not 4 since that was tossed out, it’s 5. Keep 5 and toss out all multiples of 5. Keep saving the next biggest digit you come across and throwing away all of its multiples.
There are some things that can be done to make that easier to do. The first step is to just use the odd digits! Why? The only even prime number is 2, so no need to check any other even value. Start your un-sieved pile at three and use every other digit – all the odds – up to your limit. That is what the range(3, end, 2) in the code above is, start at three, go to end with a step of two. This cuts the amount of numbers you need to deal with in half. end is increased by one as the range function goes up to, but does not include end. If your end is ten million, now you only have five million (minus one) to look at.
The range is additionally placed in a set which is an unordered collection of unique items – in this case, all the odd numbers. The range does not actually have all the odd numbers in memory, whereas the set will so the function above will use a lot of RAM, especially when the end gets large, over 100 million or so. The properties of the set are useful here to remove the multiples.
The .difference_update will update the set to have only the difference between what it had and what was passed to it. In other words, it only removes from the set the digits you give it. If you have 3, 5, 7, 9, 11, 13 and 15 in the set and give the difference_update 4, 5 and 6 – after checking to see if each value is in the set, only 5 is removed as 4 and 6 weren’t there to be removed in the first place. But now you have your collection of odd digits. So it’s time to cycle through them and sieve out multiples.
Since filtering every odd digit is still a lot of work, the second step to make it easier to do is only cycle up to the square root of your end/limit. Why? Any multiples larger than that would produce values larger than your limit. If your limit is 49, check only up to 7 (the square root of 49). When you toss out the multiples of 3, that includes 3*7, when you toss out multiples of 5, that includes 5*7, and when you toss out multiples of 7, that includes 7*7. What about 11? Well 7*11 is larger than your limit of 49.
What about 7*6? That’s 42, an even number which you already discounted because you used odd digits. This is the for p in range(3, int(end ** .5) + 1, 2): line of code above. When you filter out multiples only up to the square of your limit, the remaining digits between the root and the limit can only make multiples bigger than your limit. Cycling through 1,580 digits (all the odds between three and the square root of ten million) is far easier and faster than cycling through 4,999,999 digits (all the odds between three and ten million)!
This leads right into the third step for easiness. Only filter out the multiples starting at the square of the digit you found. Why? All the smaller ones were already filtered out! You find 5, there’s no reason to look for 10, 15, or 20 in the pile to toss out. Found 7? Start looking for multiples of 7 starting at 49 – you’ve already sieved out 14, 21, 28, 35 and 42 as those are multiples of 2, 3, 4(2*2), 5 and 6 (2*3). The smallest remaining multiple will be the square.
The fourth step to easy street is to toss out only every other multiple starting from the square. Why? Because those are the even multiples. Multiples of 7 starting at 49 are 49, 56, 63, 70, 77, 84, etc. Skip the evens by going twice the step of the found digit. This is the if p in m: d(range(p * p, end, p + p)) line of code in the above function. If 7 is in the pile, remove all digits from pile starting from 49 to end at a step of 14. Odd plus odd is always even, so skip the evens with odd + even. While you may make multiples to sieve out that already were removed, try not to make them in the first place. The fewest number of digits to deal with, the better (easier and faster)!
Then finally, because the set is unordered, the function needs to sort the primes before return them – it also adds in 2 because that’s a prime and it wasn’t in the set of odd digits. But if you don’t care if the primes are in order or not, you can change the last line of code (return ([2] + sorted(m)) if end > 2 else []) to m.add(2); return m if end > 2 else set().
There’s many other ways to write this sieve. For instance, here’s a version that forgoes the set and instead uses a list that represents if an odd number is prime (True), then sets the multiples to be False:
def eratos2(n):
"finds primes by Eratosthenes' sieve with list"
h = n // 2 + n % 2; m = [True] * h # from Orthallelous
for p in range(1, int(n ** .5) // 2 + 1):
if m[p]: q=p+p; m[q*(p+1)::q+1] = [False]*((h-q*p)//(q+1))
return ([2] + [2*p + 1 for p in range(1, h) if m[p]]) if n>1 else []
It’s more complicated, but skips checking if a multiple is in a set and runs faster. This could also be written using wheel factors, or not collecting all the digits in the first place and only keeping track of the next upcoming multiple of a found prime making it even more convoluted… or just don’t bother figuring it out and use someone else’s work. But I like this first function at the top here for its simplicity.
Leave a comment