Python serial vs multiprocessing vs threading code

Question

Python serial vs multiprocessing vs threading code

Asked 8 years, 4 months ago

Viewed 1,356 times

3

I’m using python version 3.4

How do I find out that my codes are actually working and doing what they’re supposed to? When looking at the System Monitor (linux) I noticed that codes using multiprocessing and threading do not use another CPU, they keep forcing the same CPU to 100% always. Shouldn’t they automatically divide the work between the other Cpus? I’m doing the correct code implementation?

This is the example of how the code should be output, and the results should be the same for the three codes, however, the code using multiprocessing always returns 0 in the total number of primes:

total de números primos no intervalo definido = 5133
tempo gasto para processar = 23.84818319s (*tempo varia óbvio*)

The problem itself of saying what the total number of primes is not even the big question, because the larger goal is to compare the execution time of the 3 implementations. But I would like to understand why this behavior of multiprocessing code does not return me anything in total

Code using a multiprocessing:

from multiprocessing import Process
import time

primos=0
def ePrimo():
    global primos
    limite=50000
    for numero in range (2,limite+1):
        for i in range (2,numero):
            if numero % i == 0: break
        else:
            primos+=1
tic = time.time()
if __name__ == '__main__':
    p = Process(target=ePrimo)
    p.start()
    p.join()
toc = time.time()


print("total de números primos no intervalo definido = "+ str(primos))
print("tempo gasto para processar = " + str(toc-tic)+"s")

Code using threading:

import threading
import time

primos=0
def ePrimo():
    global primos
    limite=50000
    for numero in range (2,limite+1):
        for i in range (2,numero):
            if numero % i == 0: break
        else:
            primos+=1

t = threading.Thread(target=ePrimo)
tic = time.time()   
t.start()
t.join()
toc = time.time()


print("total de números primos no intervalo definido = "+ str(primos))
print("tempo gasto para processar = " + str(toc-tic)+"s")

Serial code:

import time

primos=0
def ePrimo():
    global primos
    limite=50000
    for numero in range (2,limite+1):
        for i in range (2,numero):
            if numero % i == 0: break
        else:
            primos+=1

tic = time.time()   
ePrimo()
toc = time.time()




print("total de números primos no intervalo definido = "+ str(primos))
print("tempo gasto para processar = " + str(toc-tic)+"s")

1

multiprocessing uses processes that, unlike threads, do not support memory. The 'primes' variable is being incremented only in one process. Try putting the print under the for loop.

– Fabiano

2017/04/10 at 21:08

1 answer

Browser other questions tagged python python-3.x thread parallelism runtime

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2017-04-11T22:06:12+00:00

So - the problem is that your above programs are not parallel - they are serial.

The question of parallelizing a code is not so trivial, for some problems it can be quite difficult, and above all, there is no magic involved.

Its "main algorithm", let’s say, to check whether a number is prime is the innermost - these two lines:

    for i in range (2,numero):
        if numero % i == 0: break

The part of other possible optimizations of this algorithm (you can count from 2 to 2, and you only need to check to the square root of the number) - realize that, in fact, an interaction of the for does not depend on the previous "for" values - so this is the key to whether a calculation is parallelized or not. That is, in this case, if each division check ran on a distinct thread or process, this algorithm could benefit from parallelization. Only that it is not circling in parallel - in the two cases where you think you are parallelizing something, you call A function that traverses all numbers in sequence, and for each number it traverses all potential divisors - but that function itself is a single sequential block: it will run in a single core!

Parallelizing the checking of primes itself is not so simple - because having 50000 processes, for example, one to check each divisor of one of its numbers, would not be effective: the computation required is small (just check the rest of a division)while the resources needed to create a process, or even another thread, are many - involving running tens of thousands of native instructions (in contrast to a few dozen for checking rest in Python or with a single instruction if that code was in C)

Now, breaking the functions a little bit more: one that checks if a single number is prime - and another that makes the call to those functions, would facilitate the parallelization there. Even this isn’t free: you need to create a reasonable number of threads and/or processes - the optimal number is something close to the number of cores you have - and distribute the tasks so that processes and threads already created are repurposed to process new numbers - and - save the result of all this, for your final accounting.

Fortunately Python solves the parallelization problem by this path with the module concurrent.futures of the standard library.

So before refactoring your code to use this path, two other considerations:

(1) Fabiano’s comment is perfect: tasks performed in distinct processes do not share global variables - in contrast to tasks in multiple threads. Concurrent.Utures allows the task to return values, and it is in charge of transposing them between processes.

(2) Precisely to ensure the integrity of variables and structures such as lists and dictionaries, Python does not run true threads in parallel - unless there is some input/output task, only one thread runs Python code at a time - Runtime to Python does this using a global interpreter lock - the "GIL". Extensions in C, or using Cython, programmed so that parallel code will not leave other data structures inconsistent, can release GIL and take advantage of multi-threading.

So, first your program, just refactoring the function that checks a single prime number - but also with the otimations I mentioned above: if you don’t find any factor of a number until you get to its square root, there won’t be any factor after it (for example, 17, the root is 4.1 - if "4" is not a divisor of 17, the factors from 5 onwards, multiplied by any other factor from 5 onwards, will result in a number greater than 17).

For numbers up to 50000 this gets 100 times faster - if you look for larger numbers, this factor will increase:

import time

def checa_primo_raiz(numero):
    if numero == 2:
        return True
    if not numero % 2:
        return False
    raiz = numero ** 0.5
    if raiz == int(raiz):
        return False
    for i in range(3, int(raiz + 1), 2):
        if not numero % i:
            return False
    return True


def checa_primo_nutella(numero):
    for i in range(2, numero):
        if not numero % i:
            return False
    return True


def ePrimo(checa_primo):
    global primos
    limite=50000
    primos = 0
    for numero in range(2,limite + 1 ):
        if checa_primo(numero):
            primos += 1
            m1.add(numero)


def teste():
    for func in (checa_primo_raiz, checa_primo_nutella):
        tic = time.time()
        ePrimo(func)
        toc = time.time()

        print("Função {}:".format(func.__name__))
        print("total de números primos no intervalo definido = "+ str(primos))
        print("tempo gasto para processar = " + str(toc-tic)+"s")

if __name__ == "__main__":
    teste()

And finally, let’s go for a version of your code using Current.Multiple and Multiple Processes - I didn’t mention above, but breaking each task and making the interprocess call also takes time - so even if we do this for each number after the fact that Python has to serialize the call, coordinate the method call in the other process, and serialize the result back, uses a lot of processing compared to the trivial task of fetching divisors between 2 and sqrt(50000) (which is ~71. ) So I took advantage that I changed the code to compare different methods, and compare the parallelization at different intervals. My machine has 2 physical cores and 4 if count the "hyperthreads" and I noticed a speed-up of exact twice with the parallelization in "larger slices" - and almost no gain with smaller slices (the number passed in the parameter for Processpoolexecutor is the number of "Workers" processes that will work on the tasks):

import time

from concurrent.futures import ProcessPoolExecutor, as_completed

def checa_primo(numero):
    if numero == 2:
        return True
    if not numero % 2:
        return False
    raiz = numero ** 0.5
    if raiz == int(raiz):
        return False
    for i in range(3, int(raiz + 1), 2):
    #for i in range(2, numero):
        if not numero % i:
            return False
    return True


def checa_intervalo(inicio, tamanho):
    return sum(int(checa_primo(numero)) for numero in range(inicio, inicio + tamanho))


def ePrimo(limite, intervalo=100):

    primos = 0

    processos  = ProcessPoolExecutor(4)
    checagens = set()
    for numero in range(2,limite + 1, intervalo):
        final = min(limite - numero, intervalo)
        checagens.add(processos.submit(checa_intervalo, numero, final))

    for checagem in as_completed(checagens):
        primos += checagem.result()

    return primos

def teste():
    for intervalo in (4, 10, 50, 100, 1000, 5000, 10000):
        tic = time.time()
        primos = ePrimo(100000, intervalo)
        toc = time.time()

        print("\n Agrupando {} números por tarefa:".format(intervalo))
        print("total de números primos no intervalo definido = "+ str(primos))
        print("tempo gasto para processar = " + str(toc-tic)+"s")

if __name__ == "__main__":
    teste()

Completion

Parallelization speeds things up - but in order to parallelize the programmer has to know what I’m doing, know how to choose the method, and know how to choose the parameters - and often the method to parallelize won’t be trivial at all. On the other hand, just by adjusting the algorithm, we had a gain of close to 100 times in the running speed of your initial program. That’s why one of the exponents of computer science already said, "Premature Optimization is the root of all evil" (Donald Knuth) - someone might say that "it’s taking a while, let’s parallelize" will suddenly pull a lot of resources and not improve almost anything, if you don’t know what you’re doing.