How to generate 200,000 primes as fast as possible in Python?

Question

How to generate 200,000 primes as fast as possible in Python?

Asked 7 years, 11 months ago

Viewed 4,052 times

47

Heed: I’m not looking for ready-made pieces of code. I just want someone to help me think about in some way to generate these 200,000 primes as efficiently as possible, in Python.

I’m working on it #61 of the Codeabbey, but I can’t think of how to generate the cousins that exercise asks for.

The statement basically considers that there is a list that contains all the prime numbers in the world, starting with 2. Based on this list, the statement gives a series of indexes (indexes of this list), and the program has to return the prime that occupies this index in this imaginary list.

Ex.: In the list [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, ...], index 1 is 3; 2 is 5, ...

The statement ensures that the series of indices it will provide randomly has no index higher than 200,000 (so 200,000 primes, at most)

Example of input and output:

input data:
7 1 199999 4    # são os índices da list imaginária

answer:
19 3 2750131 11    # são os primos dessa list, correspondentes aos índices acima

At the end of the statement, Codeabbey warns that there is a way to generate the result in +/- 1 second. I could leave my code the way it is now, wait two days and give the answer (hahaha), but that would be cheating and I wouldn’t have learned anything from the exercise.

I left my program running for 10 minutes and it didn’t work. Maybe a second is too little time, but surely you have a way to solve it in, um, less than 20 seconds?

To try to solve this, I am deploying the Sieve of Eratosthene (or Sieve of Eratosthenes)

Follow my current code:

from math import sqrt

def primos(n, maxIndex):
    maiorCheck = int(sqrt(n))
    lista = []
    i, x, qtdPrimos = 0, 0, 0
    for m in range(2, n+1):
        lista.append(m)
    while x <= maiorCheck:
        x = lista[i]
        i += 1
        for numero in lista:
            if numero != x:
                if numero % x == 0:
                    lista[lista.index(numero)] = 0
        while 0 in lista:
            del lista[lista.index(0)]
        if len(lista) == maxIndex:
            x = maiorCheck + 1
    return lista

qtdPrimos = int(input('Entre com a quantidade de primos a serem impressos: ').strip())

indexes = input('Entre com os índices para os quais serão retornados primos: ').strip().split()
result = []

for n in range(len(indexes)):
    indexes[n] = int(indexes[n])

arrayPrimos = primos(2750131, max(indexes))

for m in range(len(indexes)):
    result.append(str(arrayPrimos[indexes[m]-1]))

print('\n--- R E S U L T A D O ---\n'+' '.join(result))

3

Good question. I managed to do it in 12 seconds on the Chrome console myself, I’d like to know how to do it faster.

– Oralista de Sistemas

2017/08/23 at 17:34
2

The question is one of the best made of the site, the title not so much. This time requirement is complicated.

– Maniero

2017/08/23 at 17:37
1

@Bigown improved the title, leaving the requirement of less rigorous time.

– santosmarco_

2017/08/23 at 17:39
1

Little brother, I have an example in Java that produces cousins very quickly, maybe I can help you link: https://github.com/HallefBruno/Estrutura-dados/blob/master/primo.java

– Bruno

2017/08/23 at 17:47
1

So, it doesn’t say you can do it in 1 second in Python. It might take 3 times, or 10 times just because it’s Python. I’m not saying it does, but it might be. Ali makes a naive statement. There’s a way to make it go faster. Does something that distributes the calculation on 1000 machines or at least colors. When they wrote this was a time, the machines evolved. Imagine that written in the 90s today would give with a horrible algorithm.

– Maniero

2017/08/23 at 17:47
4

One of the most complex problems there is is generating primes, the problem gets worse the more numbers you want. There are complex algorithms that can be improved, I haven’t even seen your code, but you don’t have to check everything. I didn’t even choose to mess with not generating the cousins but to provide them :) See https://en.wikipedia.org/wiki/Sieve_of_Atkin

– Maniero

2017/08/23 at 18:03

Show 1 more comment

8 answers

Browser other questions tagged python mathematics

You are not signed in. Login or sign up in order to post.

by Woss • **73,416** points · Answer 1 · 2017-08-23T20:54:28+00:00

With the very Sieve of Eratosthenes I achieved very satisfactory results. It is worth remembering that, as the sieve returns the list of all primes smaller than the input value, we will have to inform the highest prime value that the program will accept. The statement states that it is guaranteed that no number will be requested after the 200,000°, so initially we must know what this value is: 2.750.159.

A very simple implementation of the sieve is:

def sieve_of_eratosthene(N):

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            yield value

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

So we can get all 200,000 first prime numbers by making:

primes = list(sieve_of_eratosthene(2750159))

And generate the output as follows:

print("Saída:", [primes[i] for i in [7, 1, 199999, 4]]) # Saída: [19, 3, 2750159, 11]

Being [7, 1, 199999, 4] the entry of the program.

Note: 199.999° prime number will be 2.750.131, as expected, only if number 2 occupies index 1. Whereas the implementation sets index 0 for value 2, the expected output value would refer to index 199.998°.

Using the module timeit for the measurement of time, I obtained an average time, in 100 executions, of 0.69539769177063s.

See working on Ideone.

A brief explanation of the code

The first line of the function, A = [True] * (N+1) creates a list of N+1 elements, all defined as True, initially indicating that all values between 0 and N are prime numbers. Soon after, the values 0 and 1 are defined as False, indicating that these are not prime numbers. And with a repeat loop, all other values are traversed and whenever you find a prime value, return it and delete it from the list, defining as False, all numbers that are multiples of that prime found. For a better view, the list A would be something like:

A = [False, False, True, True, False, True, False, ...]

Indicating that 0 is not prime, 1 is not prime, 2 is prime, 3 is prime, 4 is not prime, 5 is prime, 6 is not prime, so on. What the function enumerate does is to return a pair of values where the first represents the index in the list and the second the value itself. Thus, doing enumerate(A), would be returned something similar to:

[(0, False), (1, False), (2, True), (3, True), (4, False), (5, True), (6, False), ...]

And that’s why in for there are two values. The first value returned from enumerate is assigned to the value and the second to prime, so when prime is true, we know that value will be a prime number.

for value, prime in enumerate(A):
    ...

Already the yield plays the role of return, however, for a generator. I confess that I ended up complicating more than necessary in this code using a generator, because since the generator should be converted to a list, I could generate it directly. The code would look like this:

def sieve_of_eratosthene(N):

    # Lista de números primos:
    numbers = []

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            numbers.append(value)

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

    return numbers

See working on Ideone.

Note that the main difference is that where before I used the yield, now used the append from a list.

An extra implementation

As commented by LINQ in its reply, the fact of having to know beforehand the n-nth prime value can be considered a limiting of the solution. A practical alternative is to apply the mathematical concept presented at the end of this answer to calculate a prime value close to the desired one. Knowing that the n-nth prime number, Pn, is less than n ln(n) + n ln(ln(n)), if we calculate all the prime values up to this value we are sure we have calculated the Pn value. Something like:

def sieve_of_eratosthene(N):

    N = floor(N*log(N) + N*(log(log(N))))

    # Lista de números primos:
    numbers = []

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            numbers.append(value)

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

    return numbers

See working on Ideone.

Just by adding the first line, we can now call the function sieve_of_eratosthene(2e5) to get the first 200,000 prime numbers. In fact, you get even more, so the running time can increase.

Possible alternative

It is proved mathematically, if I understand correctly, that the n-nth prime number, being n greater than or equal to 6, shall belong to the set of numbers defined by:

For example, for n = 199999, the following set is obtained:

2741586 < P_n < 2941585

That contains the expected value 2750159; however, this will not be the only prime number in this range, so the challenge following this logic would be to correctly identify the value within the range.

by Jéf Bueno • **67,331** points · Answer 2 · 2017-08-23T19:26:57+00:00

My contribution is a somewhat implementation naive. It takes (on my machine) approximately 71 seconds to generate 200,000 primes. Despite this, it is a basic implementation, uses nothing from third parties and is very easy to understand.

In this implementation, no need to know beforehand which is the nth prime number, ie it is possible to create a list with any quantity prime-number.

Note that it is not an adaptation of your code, it was created from scratch. The algorithm consists of searching "raw form", checking all odd numbers.

The algorithm is based on the statement that a number is composed (not-prime) if, if only there is some divisor cousin less than or equal to its square root.

from math import sqrt, ceil
import time

def checkPrime(num):
    if num % 2 == 0: return False
    i = 3
    while i <= ceil(sqrt(num)):
        if num % i == 0: 
            return False
        i += 2
    return True

def primes(q):
    primelist = [2]
    number = 3
    while len(primelist) < q:
        if(checkPrime(number)):
            primelist.append(number)        
        number += 2    
    return primelist

Version 2

It takes 24 seconds (on the same machine) to generate the 200,000 primes. The difference between this and the other is that, when checking whether a number is prime or not, only the known prime numbers are used.

This is because, as I said above, a number is composed (not-prime) when it has some divisor cousin less than or equal to its square root.

def checkPrime(num, baseList):    
    for p in baseList:
        if(p > ceil(sqrt(num))): break

        if num % p == 0:
            return False
    return True

def primes(q):
    primelist, number = [2], 3

    while len(primelist) < q:
        if checkPrime(number, primelist):
            primelist.append(number)
        number += 2
    return primelist

The use would be like this:

def main():
    lista = primes(200000)
    print("Saída: ", [lista[i] for i in [199999, 1, 7]])

Exit: [2750159, 3, 19]

by nosklo • **5,801** points · Answer 3 · 2018-09-25T01:48:18+00:00

I arrived late in answering. However, I will contribute an implementation that is very interesting and useful, because it NO LIMIT NEEDED.

It uses the 'Erastóthenes' sieve algorithm, but in a reversed, mobile way, storing only 1 future multiple of each combination of primes and releasing memory as it advances, so that it is not necessary to define beforehand which is the end of the calculation! And so you don’t need to set a huge list in memory to be marked.

def eratosthenes():
    D = {}  # dicionario para armazenar os futuros primos
    q = 2   # primeiro inteiro a testar
    while True:
        if q not in D:
            yield q       # nao esta marcado entao eh primo
            D[q*q] = [q]  # armazena somente o primeiro multiplo 
                          # que ainda nao esta armazenado
        else:
            for p in D[q]:  # todos os nao-primos armazenados aqui precisam andar
                D.setdefault(p+q,[]).append(p)
            del D[q]        # libera memoria do que ja passou
        q += 1

Therefore, you can call this function and stop at any time, for example

for p in eratosthenes():
    if p > 200000:
        break
    print(p)

by Lacobus • **13,510** points · Answer 4 · 2017-08-23T19:31:35+00:00

Code:

p = []

def gerar_primos():

    limite = 2750159 + 1

    primos = []
    nao_primo = set()

    for n in range( 2, limite ):

        if n in nao_primo:
            continue

        for f in range( n * 2, limite, n ):
            nao_primo.add( f )

        primos.append( n )

    return primos;


def primo(idx):

    global p

    if not p:
        p = gerar_primos();

    return p[ idx - 1 ]


print primo(7)
print primo(1)
print primo(199999)
print primo(4)

Exit:

$ time python primos.py 
17
2
2750131
7

real    0m1.963s
user    0m1.891s
sys 0m0.072s

by Bruno Rodrigues • 21 points · Answer 5 · 2018-10-11T01:11:41+00:00

I know a little bit about Java and I’m learning Python. I made this algorithm and it calculates all the odd numbers extremely quickly. In my notebook it calculates and prints all prime numbers from 0 to 1,000,000 in less than 3 seconds. And from 0 to 4,000,000 makes in about 17 seconds.

import time

num2 = int(input("Informe até qual número procurar: "))
inicio = time.time()
print("Esses são os número primos entre 0 e %i"%num2)
# lista com o primeiro número primo.
primos = [2,]
print(primos[0])

# só testa números impares
for x in range(3,num2+1,2):
num_divisoes = 0
    # pega o primeiro número e tenta dividi-lo pelos número primos menores que ele.
    for y in primos:
        # Quando o cociente for menor que o divisor para de testar números primos maiores.
        if x//y < y:
            break

        #se der para dividir o número por algum número primo já descoberto então ele não é primo.
        if x % y == 0:
            #marca que ele não é primo.
            num_divisoes = 1
            break

    # se não tiver nenhuma divisão adiciona o número na lista de primos.
    if num_divisoes == 0:
        primos.append(x)
        print(x)
        num_divisoes = 0

print("Existe %i números primos entre 0 e %i"%(len(primos),num2))
fim = time.time()
print("Executado em %.3f segundos"%(fim-inicio))

What made my code get extremely fast was this part:

# Quando o cociente for menor que o divisor para de testar números primos maiores.
if x//y < y:
   break

Without this excerpt the prime number 999,983 would make about 78,497 tests to verify if it is prime. As this code it makes only 169 tests.

by Mário César Fracalossi Bais • 26 points · Answer 6 · 2020-12-05T10:12:56+00:00

All cousins up to 2,000,000 - takes reasonable time; for 200,000 is fast.

m = 200000 # para calcular primos até esse número m
# constrói a list comprehension excluindo os divisíveis pelos menores números primos
P = [2] + [3] + [5] + [x for x in range(7,m,2) if x % 3 != 0 and x % 5 != 0]
PL = len(P)
i = 15 # menor composto a excluir tem índice 15: é o 49
while i < PL:
    # Os divisíveis por 2, 3, e 5 já foram excluídos.
    # Começamos a excluir os divíveis de 7, cujo índice na list é 3
    j = 3 # P[3] = 7, menor primo para iniciar teste
    while i < PL and P[j] * P[j] <= P[i]:
    if P[i] % P[j] == 0:
        del P[i]
        PL = len(P)
        i = i - 1
        break
    else:
        j += 1
    i += 1
print(P)

I just put together more efficient code that takes 60 seconds:

    import time
    inicio = time.time()
    print(inicio)
    m = 2702000
    P = [2] + [3] + [5] + [x for x in range(7,m,2) if x % 3 != 0 and x % 5 != 0]
    PL = len(P)
    C = {P[i] for i in range(15, PL) for j in range(3, int(P[i] ** (1/2))) if P[i] % P[j] == 0}
    P = [P[i] for i in range(PL) if not(P[i] in C)]
    print(P)
    fim = time.time()
    print(fim)
    print("duração: ", fim - inicio)

The following code assembles the series of cousins up to 2,000,000 in half a second. For the number suggested in the problem (2.750.159), burst the memory of my jupyter. It is possible that in other noteook run normal:

    inicio = time.time()
    print(inicio)
    m = 2000000
    P = [2] + [3] + [5] + [x for x in range(7,m,2) if x % 3 != 0 and x % 5 != 0]
    PL = len(P)
    C = {P[i] * P[j] for i in range(3, int(m ** (1/2))) for j in range(i, int(m ** (1/2)))}
    P = [P[i] for i in range(PL) if not(P[i] in C)]
    print(P)
    fim = time.time()
    print(fim)
    print("duração: ", fim - inicio)

by Rafael Setton • **139** points · Answer 7 · 2020-05-18T12:31:16+00:00

I think the most efficient way is, for each number test whether it is divisible by any of the smaller cousins than it. Example:

primos = [2, 3, 5]
...  # Para cada número entre 6 e 200000 define se ele é ou não primo
if num_e_primo:
    primos.append(num)

For 6 just test whether it is divisible by 2, 3, or 5. For 10 test for 2, 3, 5, and 7 So on...

It is not necessary to test by all numbers, only cousins

Example:

If x is divisible by 30, it will be divisible by 2, 3, 5, 6, 10, 15.

by tiago • 1 point · Answer 8 · 2019-06-24T18:43:13+00:00

test this one:

"""
Created on Mon Jun 24 14:55:40 2019

@author: Tiago C. Coura
"""

def primeirosprimos(x):


    seq=[2,3,5]
    a=0
    controler=0
    while True:

        a+=1

        if a%2==0:
            r1=3*a+5
        else:
            r1=3*a+4
        if r1%5==0:
            pass
        else:
            z= len(seq)/2
            for loop in seq[0:z]:
                if r1%loop==0 :
                   controler+=1
                   break
            if controler > 0:
                controler=0
                pass
            else:
                seq.append(r1)

        if len(seq)>x-1:
            break

    return seq