How to generate 200,000 primes as fast as possible in Python?

Asked

Viewed 4,052 times

47

Heed: I’m not looking for ready-made pieces of code. I just want someone to help me think about in some way to generate these 200,000 primes as efficiently as possible, in Python.

I’m working on it #61 of the Codeabbey, but I can’t think of how to generate the cousins that exercise asks for.

The statement basically considers that there is a list that contains all the prime numbers in the world, starting with 2. Based on this list, the statement gives a series of indexes (indexes of this list), and the program has to return the prime that occupies this index in this imaginary list.

Ex.: In the list [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, ...], index 1 is 3; 2 is 5, ...

The statement ensures that the series of indices it will provide randomly has no index higher than 200,000 (so 200,000 primes, at most)

Example of input and output:

input data:
7 1 199999 4    # são os índices da list imaginária

answer:
19 3 2750131 11    # são os primos dessa list, correspondentes aos índices acima

At the end of the statement, Codeabbey warns that there is a way to generate the result in +/- 1 second. I could leave my code the way it is now, wait two days and give the answer (hahaha), but that would be cheating and I wouldn’t have learned anything from the exercise.

I left my program running for 10 minutes and it didn’t work. Maybe a second is too little time, but surely you have a way to solve it in, um, less than 20 seconds?

To try to solve this, I am deploying the Sieve of Eratosthene (or Sieve of Eratosthenes)

Follow my current code:

from math import sqrt

def primos(n, maxIndex):
    maiorCheck = int(sqrt(n))
    lista = []
    i, x, qtdPrimos = 0, 0, 0
    for m in range(2, n+1):
        lista.append(m)
    while x <= maiorCheck:
        x = lista[i]
        i += 1
        for numero in lista:
            if numero != x:
                if numero % x == 0:
                    lista[lista.index(numero)] = 0
        while 0 in lista:
            del lista[lista.index(0)]
        if len(lista) == maxIndex:
            x = maiorCheck + 1
    return lista

qtdPrimos = int(input('Entre com a quantidade de primos a serem impressos: ').strip())

indexes = input('Entre com os índices para os quais serão retornados primos: ').strip().split()
result = []

for n in range(len(indexes)):
    indexes[n] = int(indexes[n])

arrayPrimos = primos(2750131, max(indexes))

for m in range(len(indexes)):
    result.append(str(arrayPrimos[indexes[m]-1]))

print('\n--- R E S U L T A D O ---\n'+' '.join(result))
  • 3

    Good question. I managed to do it in 12 seconds on the Chrome console myself, I’d like to know how to do it faster.

  • 2

    The question is one of the best made of the site, the title not so much. This time requirement is complicated.

  • 1

    @Bigown improved the title, leaving the requirement of less rigorous time.

  • 1

    Little brother, I have an example in Java that produces cousins very quickly, maybe I can help you link: https://github.com/HallefBruno/Estrutura-dados/blob/master/primo.java

  • 1

    So, it doesn’t say you can do it in 1 second in Python. It might take 3 times, or 10 times just because it’s Python. I’m not saying it does, but it might be. Ali makes a naive statement. There’s a way to make it go faster. Does something that distributes the calculation on 1000 machines or at least colors. When they wrote this was a time, the machines evolved. Imagine that written in the 90s today would give with a horrible algorithm.

  • 4

    One of the most complex problems there is is generating primes, the problem gets worse the more numbers you want. There are complex algorithms that can be improved, I haven’t even seen your code, but you don’t have to check everything. I didn’t even choose to mess with not generating the cousins but to provide them :) See https://en.wikipedia.org/wiki/Sieve_of_Atkin

Show 1 more comment

8 answers

48

With the very Sieve of Eratosthenes I achieved very satisfactory results. It is worth remembering that, as the sieve returns the list of all primes smaller than the input value, we will have to inform the highest prime value that the program will accept. The statement states that it is guaranteed that no number will be requested after the 200,000°, so initially we must know what this value is: 2.750.159.

A very simple implementation of the sieve is:

def sieve_of_eratosthene(N):

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            yield value

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

So we can get all 200,000 first prime numbers by making:

primes = list(sieve_of_eratosthene(2750159))

And generate the output as follows:

print("Saída:", [primes[i] for i in [7, 1, 199999, 4]]) # Saída: [19, 3, 2750159, 11]

Being [7, 1, 199999, 4] the entry of the program.

Note: 199.999° prime number will be 2.750.131, as expected, only if number 2 occupies index 1. Whereas the implementation sets index 0 for value 2, the expected output value would refer to index 199.998°.

Using the module timeit for the measurement of time, I obtained an average time, in 100 executions, of 0.69539769177063s.

See working on Ideone.

A brief explanation of the code

The first line of the function, A = [True] * (N+1) creates a list of N+1 elements, all defined as True, initially indicating that all values between 0 and N are prime numbers. Soon after, the values 0 and 1 are defined as False, indicating that these are not prime numbers. And with a repeat loop, all other values are traversed and whenever you find a prime value, return it and delete it from the list, defining as False, all numbers that are multiples of that prime found. For a better view, the list A would be something like:

A = [False, False, True, True, False, True, False, ...]

Indicating that 0 is not prime, 1 is not prime, 2 is prime, 3 is prime, 4 is not prime, 5 is prime, 6 is not prime, so on. What the function enumerate does is to return a pair of values where the first represents the index in the list and the second the value itself. Thus, doing enumerate(A), would be returned something similar to:

[(0, False), (1, False), (2, True), (3, True), (4, False), (5, True), (6, False), ...]

And that’s why in for there are two values. The first value returned from enumerate is assigned to the value and the second to prime, so when prime is true, we know that value will be a prime number.

for value, prime in enumerate(A):
    ...

Already the yield plays the role of return, however, for a generator. I confess that I ended up complicating more than necessary in this code using a generator, because since the generator should be converted to a list, I could generate it directly. The code would look like this:

def sieve_of_eratosthene(N):

    # Lista de números primos:
    numbers = []

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            numbers.append(value)

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

    return numbers

See working on Ideone.

Note that the main difference is that where before I used the yield, now used the append from a list.

An extra implementation

As commented by LINQ in its reply, the fact of having to know beforehand the n-nth prime value can be considered a limiting of the solution. A practical alternative is to apply the mathematical concept presented at the end of this answer to calculate a prime value close to the desired one. Knowing that the n-nth prime number, Pn, is less than n ln(n) + n ln(ln(n)), if we calculate all the prime values up to this value we are sure we have calculated the Pn value. Something like:

def sieve_of_eratosthene(N):

    N = floor(N*log(N) + N*(log(log(N))))

    # Lista de números primos:
    numbers = []

    # Cria-se uma lista referente a todos os inteiros entre 0 e N:
    A = [True] * (N+1)

    # Define os números 0 e 1 como não primos:
    A[0] = A[1] = False

    # Percorra a lista até encontrar o primeiro número primo:
    for value, prime in enumerate(A):

        # O número é primo?
        if prime:

            # Retorna o número primo:
            numbers.append(value)

            # Remova da lista todos os múltiplos do número enontrado:
            for i in range(value**2, N+1, value):
                A[i] = False

    return numbers

See working on Ideone.

Just by adding the first line, we can now call the function sieve_of_eratosthene(2e5) to get the first 200,000 prime numbers. In fact, you get even more, so the running time can increase.


Possible alternative

It is proved mathematically, if I understand correctly, that the n-nth prime number, being n greater than or equal to 6, shall belong to the set of numbers defined by:

inserir a descrição da imagem aqui

For example, for n = 199999, the following set is obtained:

2741586 < P_n < 2941585

That contains the expected value 2750159; however, this will not be the only prime number in this range, so the challenge following this logic would be to correctly identify the value within the range.

  • Excellent! Very good! Now it was very fast.

  • But I didn’t understand a lot of things you did on the job sieve_of_eratosthene... Can you point out the concepts you used for me to research? I’m new yet...

  • @santosmarco_ what concepts would be those that did not understand?

  • So... there’s a few things I haven’t learned yet, but I’d love to do some research on. In the function you created, from the for line (inclusive) to the last.

  • For example: 1) Because two values, separated by a comma in the first is; 2) What is the enumerate(A); 3) How does it identify that it is prime only with "if prime"? ; 4) What is Yield; 5) and the last is I did not understand... Because it starts with value ** 2, which is this value variable, and because turning the list item into False eliminates it?

  • @Andersoncarloswoss If you combine this algorithm to find a set where the nth prime is found together with the sieve algorithm, you can make the solution more generic. Working for "any" prime number.

  • @santosmarco_ as soon as possible I add to the answer the explanations

  • According to the statement of the link index 7 should not return 17 ?

  • @Magichat no, same 19. The index starts at 0, so it is the eighth value of the list.

  • Ah, yes I have, your list has range from 0 to 199999...

  • @santosmarco_ Forgive me, I ended up getting lost in my schedules and forgot to edit the answer that day. I have already added a brief explanation and an alternative solution to yield. See if it clears up your doubts, if you still have them.

  • 2

    I was thinking about this question yesterday :), then I will test the proposals

  • 3

    This answer is fantastic. I put a reward on the question to get attention and at the end of the deadline, I will give the +100 for you.

Show 8 more comments

18

My contribution is a somewhat implementation naive. It takes (on my machine) approximately 71 seconds to generate 200,000 primes. Despite this, it is a basic implementation, uses nothing from third parties and is very easy to understand.

In this implementation, no need to know beforehand which is the nth prime number, ie it is possible to create a list with any quantity prime-number.

Note that it is not an adaptation of your code, it was created from scratch. The algorithm consists of searching "raw form", checking all odd numbers.

The algorithm is based on the statement that a number is composed (not-prime) if, if only there is some divisor cousin less than or equal to its square root.

from math import sqrt, ceil
import time

def checkPrime(num):
    if num % 2 == 0: return False
    i = 3
    while i <= ceil(sqrt(num)):
        if num % i == 0: 
            return False
        i += 2
    return True

def primes(q):
    primelist = [2]
    number = 3
    while len(primelist) < q:
        if(checkPrime(number)):
            primelist.append(number)        
        number += 2    
    return primelist

Version 2

It takes 24 seconds (on the same machine) to generate the 200,000 primes. The difference between this and the other is that, when checking whether a number is prime or not, only the known prime numbers are used.

This is because, as I said above, a number is composed (not-prime) when it has some divisor cousin less than or equal to its square root.

def checkPrime(num, baseList):    
    for p in baseList:
        if(p > ceil(sqrt(num))): break

        if num % p == 0:
            return False
    return True

def primes(q):
    primelist, number = [2], 3

    while len(primelist) < q:
        if checkPrime(number, primelist):
            primelist.append(number)
        number += 2
    return primelist

The use would be like this:

def main():
    lista = primes(200000)
    print("Saída: ", [lista[i] for i in [199999, 1, 7]])

Exit: [2750159, 3, 19]

  • The latter the output is not wrong ? For the indexes 199999 and 7, according to the enunciation link...]

  • @Magichat will only be like that if you start counting from 1

  • "They will be in range from 1 to 200000. ";)

  • @Magichat Just decrease 1 user input and be happy, I do not see why this invalidate the answer. The main point is how to generate the numbers in an acceptable time and not do the whole exercise for AP

  • I drew, in case your answer is legal tmb, why as the lacobus it is not necessary to know the value of the last cousin...

  • @Magichat In fact both Anderson’s and Lacobus' response need to know beforehand which is the last cousin. Mine is more generic, but obviously has a much higher cost

  • Um truth I had not repaired that of the lacobus... For me yours is in front, would be perfect (for me) if the input and output were identical to the statement, but I already understood the reason..;)

  • @Magichat Actually, if you notice my main method main is separated from the rest of the code precisely to enable this, I did not post the code that makes this Parsing because I think it’s a secondary issue.

  • @LINQ I ended up adding a line of code in my solution that allows not knowing the nth prime number beforehand, without greatly affecting the performance. Maybe it’s in your interest.

  • Perhaps it is more efficient to avoid the square root and compare i**2 <= num (first version) or p**2 > num (second version)

Show 5 more comments

10

I arrived late in answering. However, I will contribute an implementation that is very interesting and useful, because it NO LIMIT NEEDED.

It uses the 'Erastóthenes' sieve algorithm, but in a reversed, mobile way, storing only 1 future multiple of each combination of primes and releasing memory as it advances, so that it is not necessary to define beforehand which is the end of the calculation! And so you don’t need to set a huge list in memory to be marked.

def eratosthenes():
    D = {}  # dicionario para armazenar os futuros primos
    q = 2   # primeiro inteiro a testar
    while True:
        if q not in D:
            yield q       # nao esta marcado entao eh primo
            D[q*q] = [q]  # armazena somente o primeiro multiplo 
                          # que ainda nao esta armazenado
        else:
            for p in D[q]:  # todos os nao-primos armazenados aqui precisam andar
                D.setdefault(p+q,[]).append(p)
            del D[q]        # libera memoria do que ja passou
        q += 1

Therefore, you can call this function and stop at any time, for example

for p in eratosthenes():
    if p > 200000:
        break
    print(p)
  • Writing in the standard output took 1:22 seconds on my WSL, playing in the /dev/null took 5.2s. Woss solution (printing all numbers too) took 43s writing in the standard output, and 1.7s playing in the /dev/null. I made a change to search only half the numbers in your answer (you need to go through all to "walk" the sieve, but I don’t test the pairs, only the odd ones), the times were 53s and 5.1s

  • But this solution is infinite and Woss’s is not @Jeffersonquesado

7

Code:

p = []

def gerar_primos():

    limite = 2750159 + 1

    primos = []
    nao_primo = set()

    for n in range( 2, limite ):

        if n in nao_primo:
            continue

        for f in range( n * 2, limite, n ):
            nao_primo.add( f )

        primos.append( n )

    return primos;


def primo(idx):

    global p

    if not p:
        p = gerar_primos();

    return p[ idx - 1 ]


print primo(7)
print primo(1)
print primo(199999)
print primo(4)

Exit:

$ time python primos.py 
17
2
2750131
7

real    0m1.963s
user    0m1.891s
sys 0m0.072s

2

I know a little bit about Java and I’m learning Python. I made this algorithm and it calculates all the odd numbers extremely quickly. In my notebook it calculates and prints all prime numbers from 0 to 1,000,000 in less than 3 seconds. And from 0 to 4,000,000 makes in about 17 seconds.

import time

num2 = int(input("Informe até qual número procurar: "))
inicio = time.time()
print("Esses são os número primos entre 0 e %i"%num2)
# lista com o primeiro número primo.
primos = [2,]
print(primos[0])

# só testa números impares
for x in range(3,num2+1,2):
num_divisoes = 0
    # pega o primeiro número e tenta dividi-lo pelos número primos menores que ele.
    for y in primos:
        # Quando o cociente for menor que o divisor para de testar números primos maiores.
        if x//y < y:
            break

        #se der para dividir o número por algum número primo já descoberto então ele não é primo.
        if x % y == 0:
            #marca que ele não é primo.
            num_divisoes = 1
            break

    # se não tiver nenhuma divisão adiciona o número na lista de primos.
    if num_divisoes == 0:
        primos.append(x)
        print(x)
        num_divisoes = 0

print("Existe %i números primos entre 0 e %i"%(len(primos),num2))
fim = time.time()
print("Executado em %.3f segundos"%(fim-inicio))

What made my code get extremely fast was this part:

# Quando o cociente for menor que o divisor para de testar números primos maiores.
if x//y < y:
   break

Without this excerpt the prime number 999,983 would make about 78,497 tests to verify if it is prime. As this code it makes only 169 tests.

0

All cousins up to 2,000,000 - takes reasonable time; for 200,000 is fast.

m = 200000 # para calcular primos até esse número m
# constrói a list comprehension excluindo os divisíveis pelos menores números primos
P = [2] + [3] + [5] + [x for x in range(7,m,2) if x % 3 != 0 and x % 5 != 0]
PL = len(P)
i = 15 # menor composto a excluir tem índice 15: é o 49
while i < PL:
    # Os divisíveis por 2, 3, e 5 já foram excluídos.
    # Começamos a excluir os divíveis de 7, cujo índice na list é 3
    j = 3 # P[3] = 7, menor primo para iniciar teste
    while i < PL and P[j] * P[j] <= P[i]:
    if P[i] % P[j] == 0:
        del P[i]
        PL = len(P)
        i = i - 1
        break
    else:
        j += 1
    i += 1
print(P)

I just put together more efficient code that takes 60 seconds:

    import time
    inicio = time.time()
    print(inicio)
    m = 2702000
    P = [2] + [3] + [5] + [x for x in range(7,m,2) if x % 3 != 0 and x % 5 != 0]
    PL = len(P)
    C = {P[i] for i in range(15, PL) for j in range(3, int(P[i] ** (1/2))) if P[i] % P[j] == 0}
    P = [P[i] for i in range(PL) if not(P[i] in C)]
    print(P)
    fim = time.time()
    print(fim)
    print("duração: ", fim - inicio)

The following code assembles the series of cousins up to 2,000,000 in half a second. For the number suggested in the problem (2.750.159), burst the memory of my jupyter. It is possible that in other noteook run normal:

    inicio = time.time()
    print(inicio)
    m = 2000000
    P = [2] + [3] + [5] + [x for x in range(7,m,2) if x % 3 != 0 and x % 5 != 0]
    PL = len(P)
    C = {P[i] * P[j] for i in range(3, int(m ** (1/2))) for j in range(i, int(m ** (1/2)))}
    P = [P[i] for i in range(PL) if not(P[i] in C)]
    print(P)
    fim = time.time()
    print(fim)
    print("duração: ", fim - inicio)
  • I don’t know if I entndi bm your algorithm - the weather is good, but I don’t know why it consumes so much memory. Probably Durant the creation of the set Python stores the repeated results temporarily. I made it here using the "Erastotenes sieve" - instead of generating all possible non-prime numbers and see which numbers are not on the list, I start with a list of all cut numbers which are multiples of the previous ones.

-1

I think the most efficient way is, for each number test whether it is divisible by any of the smaller cousins than it. Example:

primos = [2, 3, 5]
...  # Para cada número entre 6 e 200000 define se ele é ou não primo
if num_e_primo:
    primos.append(num)

For 6 just test whether it is divisible by 2, 3, or 5. For 10 test for 2, 3, 5, and 7 So on...

It is not necessary to test by all numbers, only cousins

Example:

If x is divisible by 30, it will be divisible by 2, 3, 5, 6, 10, 15.

-3

test this one:

"""
Created on Mon Jun 24 14:55:40 2019

@author: Tiago C. Coura
"""

def primeirosprimos(x):


    seq=[2,3,5]
    a=0
    controler=0
    while True:

        a+=1

        if a%2==0:
            r1=3*a+5
        else:
            r1=3*a+4
        if r1%5==0:
            pass
        else:
            z= len(seq)/2
            for loop in seq[0:z]:
                if r1%loop==0 :
                   controler+=1
                   break
            if controler > 0:
                controler=0
                pass
            else:
                seq.append(r1)

        if len(seq)>x-1:
            break

    return seq

Browser other questions tagged

You are not signed in. Login or sign up in order to post.