How to speed up a script to full speed in python?

Asked

Viewed 2,432 times

4

I was playing with a method to approximate the PI number and I realized that the code runs very slowly in the Python 3.6 interpreter, and the process is only using 30% of the processor.

There would be a way to run this script in full swing?

from math import sqrt
from random import random

piavg = 0
samples = 10000
scale = 1000
progress = 0
debug_rate = 5
debug_clock = 0



def pis (scale):
    pi = 0
    points = []
    for i in range(scale):
        points.append([random(),random()])
    for p in points:
        if sqrt(p[0] ** 2 + p[1] ** 2) <= 1:
            pi += 1
    pi /= scale
    pi *= 4
    return pi
for i in range(samples):
    cpi = pis(scale)
    progress += 1/samples*100
    if debug_rate > 0 and debug_clock == 0:
        print(str(int(progress)) + "%  encontrado: " + str(cpi))
        debug_clock = debug_rate
    piavg += cpi
    debug_clock -= 1
piavg /= samples

print (piavg)

3 answers

5


Without analyzing your code in depth, just pounding your eye really quickly, I’ve already noticed a critical point that can be easily improved: the call of sqrt for the calculation of the "distance".

The calculation of the square root is quite costly, and you do it very often. If you avoid it, it will greatly increase the performance of your algorithm. In your case it is quite simple because your comparison is with the value 1, whose power of 2 is 1! So just take the call from sqrt (if compared with another value, you would use its power in comparison).

With that only change in the code, the execution time already falls by half (I also used the time measurement method suggested by colleague @Miguel in his answer).

. . .
def pis (scale):
    pi = 0
    points = []
    for i in range(scale):
        points.append([random(),random()])
    for p in points:
        if (p[0] * p[0] + p[1] * p[1]) <= 1: # <=== Alteração aqui
            pi += 1
    pi /= scale
    pi *= 4
    return pi
. . .

Here in my tests, the execution of its original code took approximately 14.5 seconds and produced:

. . .
99%  encontrado: 3.188
99%  encontrado: 3.108
99%  encontrado: 3.128
99%  encontrado: 3.008
3.1419439999999845

While the execution with the aforementioned amendment took approximately 8.6 seconds and produced:

. . .
99%  encontrado: 3.032
99%  encontrado: 3.208
99%  encontrado: 3.144
99%  encontrado: 3.176
3.1421183999999878

P.S.: This is just one of the points I found most critical for the performance. But it is also worth following the guidelines of our colleague Miguel in his answer.

  • Very well Luiz, what a distraction not to have noticed that.

  • 1

    By combining our solutions we have managed to half, +- 4 secs, the execution time in relation to the question code, and your suggestion was the one that contributed the most

  • 1

    wow, I had forgotten this property sqrt(xx) < y == xx < y*y, well thought out!

2

I was able to optimize, running with python 3.5, the running time in +- 1.2 secs. What I did to measure the running time was:

from math import sqrt
from random import random
import time

start_t = time.time()

#TODO O TEU CÓDIGO AQUI, fora os imports

print(time.time() - start_t)

Since with your code it takes between 8.0 secs and 8.3 secs, with the code I was able to optimize now takes between 6.9 and 7.2, follow the code:

from math import sqrt
from random import random
import time

start_t = time.time()

piavg = 0
samples = 10000
scale = 1000
progress = 0
debug_rate = 5
debug_clock = 0

def pis (scale):
    pi = 0
    points = ((random(),random()) for _ in range(scale)) # points e um gerador em vez de ser uma lista de listas
    for p1, p2 in points: # unpacking points, assim escusamos de estar sempre a aceder aos indices
        if sqrt(p1 ** 2 + p2 ** 2) <= 1:
            pi += 1
    return pi / scale * 4 # tudo numa so expressao

for _ in range(samples): # por convencao se nao se usa o i podemos deixar assim com _ , nao sei se otimiza a performance mas e boa pratica
    cpi = pis(scale)
    progress += 1/samples*100
    if debug_rate > 0 and debug_clock == 0:
        print("{}%  encontrado: {}".format(int(progress), cpi)) # escusas de tranformar em string de formatares assim, menos duas funcoes que executas aqui
        debug_clock = debug_rate
    piavg += cpi
    debug_clock -= 1
piavg /= samples

print (piavg)

print(time.time() - start_t)

What I did I left commented to you to understand, I’m still studying ways to optimize but this is a good start

  • Very interesting expression optimization, I’ll study more on that. but I was looking for a way to run the interpreter using all the processing power, since it’s only using 30 percent of the maximum capacity of my computer.

  • 1

    You would have to do it with multiple threads, this is just a @Jeacom process. I’m already trying to help you

  • 2

    If you want to use more of your computer’s capacity, remove calls from print while running. The operating system "takes advantage" of IO operations to free the CPU for other tasks. Thus, in addition to these calls taking your program longer, it generates interruptions in the execution flow (time when your CPU is probably idle even). That is the same reason why a while infinite without a sleep (which gives such "slack") takes the CPU quickly to 100%.

  • 1

    Anyway, don’t worry about increasing use apparent of the CPU. In fact, the smaller the apparent use, the better its algorithm.

  • @Luizvieira, I am aware of the print calls, why I created the variable debug_rate, which if set to zero ceases all print calls except the end and if set to another value, makes the script only call print from times at times determined by the variable. but I found it interesting that while, then while uses more cpu than a loop for?

  • @Jeacom is not what I meant. What I meant is that a loop (any!) will naturally use the processor more if it has no interruptions. If in the middle of that loop you make a call to a sleep or for an IO operation, the operating system releases the processor at that time interval (while sleeping or while the IO operation is performed) and so you don’t notice the processor at 100%.

  • @Jeacom Also remember that you need to check that your processor has more than one core. If you do, you won’t see the graph go up to 100% unless you use multiple threads. : ) More here: http://superuser.com/questions/679679/how-to-increase-pythons-cpu-usage

  • @Jeacom E on what I was talking about testing the 100% CPU (not considering multiple cores): http://raspberrypi.stackexchange.com/questions/8077/how-can-i-lower-the-usage-of-cpu-for-this-python-program

Show 3 more comments

1

Thanks, with your help I managed to reach this result:

from random import random

scale = int(input("quantas amostras ? "))
repeat = int(input("quantas repetiçoes ? "))
pi = 0

def pic(sc):
    p = 0
    points = ([random(),random()] for i in range(sc))
    def dist (p1, p2, dist):
        d = p1 ** 2 + p2 **2
        return d - dist

    for p1, p2 in points:
        if dist(p1,p2,1) < 0:
            p += 1
    p = p * 4 / sc
    return p

print("calculando...")

for _ in range(repeat):
    pi += pic(scale)
pi /= repeat
print (pi)

This algorithm is considerably faster than my previous ones!!

  • Run some tests, but it might get better if you change the potentiation operator (**) by a simple multiplication (i.e., try to exchange p1 ** 2 for p1 * p1). The gain tends to be minimal in a few calls, but it can become considerable in longer loops (e.g., testing with the %timeit of ipython the average difference was 304 ns with the ** for 76.9 ns with the * in a loop with only 1,000 iterations).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.