So - the problem is that your above programs are not parallel - they are serial.
The question of parallelizing a code is not so trivial, for some problems it can be quite difficult, and above all, there is no magic involved.
Its "main algorithm", let’s say, to check whether a number is prime is the innermost - these two lines:
for i in range (2,numero):
if numero % i == 0: break
The part of other possible optimizations of this algorithm (you can count from 2 to 2, and you only need to check to the square root of the number) - realize that, in fact, an interaction of the for
does not depend on the previous "for" values - so this is the key to whether a calculation is parallelized or not. That is, in this case, if each division check ran on a distinct thread or process, this algorithm could benefit from parallelization. Only that it is not circling in parallel - in the two cases where you think you are parallelizing something, you call A function that traverses all numbers in sequence, and for each number it traverses all potential divisors - but that function itself is a single sequential block: it will run in a single core!
Parallelizing the checking of primes itself is not so simple - because having 50000 processes, for example, one to check each divisor of one of its numbers, would not be effective: the computation required is small (just check the rest of a division)while the resources needed to create a process, or even another thread, are many - involving running tens of thousands of native instructions (in contrast to a few dozen for checking rest in Python or with a single instruction if that code was in C)
Now, breaking the functions a little bit more: one that checks if a single number is prime - and another that makes the call to those functions, would facilitate the parallelization there. Even this isn’t free: you need to create a reasonable number of threads and/or processes - the optimal number is something close to the number of cores you have - and distribute the tasks so that processes and threads already created are repurposed to process new numbers - and - save the result of all this, for your final accounting.
Fortunately Python solves the parallelization problem by this path with the module concurrent.futures
of the standard library.
So before refactoring your code to use this path, two other considerations:
(1) Fabiano’s comment is perfect: tasks performed in distinct processes do not share global variables - in contrast to tasks in multiple threads. Concurrent.Utures allows the task to return values, and it is in charge of transposing them between processes.
(2) Precisely to ensure the integrity of variables and structures such as lists and dictionaries, Python does not run true threads in parallel - unless there is some input/output task, only one thread runs Python code at a time - Runtime to Python does this using a global interpreter lock - the "GIL". Extensions in C, or using Cython, programmed so that parallel code will not leave other data structures inconsistent, can release GIL and take advantage of multi-threading.
So, first your program, just refactoring the function that checks a single prime number - but also with the otimations I mentioned above: if you don’t find any factor of a number until you get to its square root, there won’t be any factor after it (for example, 17, the root is 4.1 - if "4" is not a divisor of 17, the factors from 5 onwards, multiplied by any other factor from 5 onwards, will result in a number greater than 17).
For numbers up to 50000 this gets 100 times faster - if you look for larger numbers, this factor will increase:
import time
def checa_primo_raiz(numero):
if numero == 2:
return True
if not numero % 2:
return False
raiz = numero ** 0.5
if raiz == int(raiz):
return False
for i in range(3, int(raiz + 1), 2):
if not numero % i:
return False
return True
def checa_primo_nutella(numero):
for i in range(2, numero):
if not numero % i:
return False
return True
def ePrimo(checa_primo):
global primos
limite=50000
primos = 0
for numero in range(2,limite + 1 ):
if checa_primo(numero):
primos += 1
m1.add(numero)
def teste():
for func in (checa_primo_raiz, checa_primo_nutella):
tic = time.time()
ePrimo(func)
toc = time.time()
print("Função {}:".format(func.__name__))
print("total de números primos no intervalo definido = "+ str(primos))
print("tempo gasto para processar = " + str(toc-tic)+"s")
if __name__ == "__main__":
teste()
And finally, let’s go for a version of your code using Current.Multiple and Multiple Processes - I didn’t mention above, but breaking each task and making the interprocess call also takes time - so even if we do this for each number after the fact that Python has to serialize the call, coordinate the method call in the other process, and serialize the result back, uses a lot of processing compared to the trivial task of fetching divisors between 2 and sqrt(50000) (which is ~71. ) So I took advantage that I changed the code to compare different methods, and compare the parallelization at different intervals. My machine has 2 physical cores and 4 if count the "hyperthreads" and I noticed a speed-up of exact twice with the parallelization in "larger slices" - and almost no gain with smaller slices (the number passed in the parameter for Processpoolexecutor is the number of "Workers" processes that will work on the tasks):
import time
from concurrent.futures import ProcessPoolExecutor, as_completed
def checa_primo(numero):
if numero == 2:
return True
if not numero % 2:
return False
raiz = numero ** 0.5
if raiz == int(raiz):
return False
for i in range(3, int(raiz + 1), 2):
#for i in range(2, numero):
if not numero % i:
return False
return True
def checa_intervalo(inicio, tamanho):
return sum(int(checa_primo(numero)) for numero in range(inicio, inicio + tamanho))
def ePrimo(limite, intervalo=100):
primos = 0
processos = ProcessPoolExecutor(4)
checagens = set()
for numero in range(2,limite + 1, intervalo):
final = min(limite - numero, intervalo)
checagens.add(processos.submit(checa_intervalo, numero, final))
for checagem in as_completed(checagens):
primos += checagem.result()
return primos
def teste():
for intervalo in (4, 10, 50, 100, 1000, 5000, 10000):
tic = time.time()
primos = ePrimo(100000, intervalo)
toc = time.time()
print("\n Agrupando {} números por tarefa:".format(intervalo))
print("total de números primos no intervalo definido = "+ str(primos))
print("tempo gasto para processar = " + str(toc-tic)+"s")
if __name__ == "__main__":
teste()
Completion
Parallelization speeds things up - but in order to parallelize the programmer has to know what I’m doing, know how to choose the method, and know how to choose the parameters - and often the method to parallelize won’t be trivial at all. On the other hand, just by adjusting the algorithm, we had a gain of close to 100 times in the running speed of your initial program. That’s why one of the exponents of computer science already said, "Premature Optimization is the root of all evil" (Donald Knuth) - someone might say that "it’s taking a while, let’s parallelize" will suddenly pull a lot of resources and not improve almost anything, if you don’t know what you’re doing.
multiprocessing uses processes that, unlike threads, do not support memory. The 'primes' variable is being incremented only in one process. Try putting the print under the for loop.
– Fabiano