Poor performance when traversing an array

Asked

Viewed 451 times

0

I’m traversing a matrix but face performance problems The idea would be to go through this matrix twice, in the second set of ties of the matriz_original if the condition is true the new matrix would receive the value at the position if not she would receive 0

Method where the performance problem is

def segmentar(caminhoImagem) :

        largura, altura, imagem = Processamento.get_propriedade_imagem(caminhoImagem)
        matriz_original = Processamento.get_matriz_mae(largura, altura, imagem)

        matriz_segmentada = [[0 for x in range(altura)] for y in range(largura)]

        for x in range(0, altura) :
                for y in range(0, largura) :

                        # FUNÇÃO DE MUNFORD–SHAH
                        constante_pixel = matriz_original[y][x]

                        #Inicio do problema de performance
                        for a in range(0, altura) :
                                for l in range(0, largura) :

                                        pixel = matriz_original[l][a]

                                        if Processamento.verifica_pixel(pixel, constante_pixel, CONST_MEDIA) :
                                                matriz_segmentada[l][a] = pixel
                                        else :
                                                matriz_segmentada[l][a] = 0
                        # ---------------------------------------------------------------------

        Processamento.montar_imagem_array(matriz_segmentada)

Method that makes the condition

def verifica_pixel(pixel, constante_pixel, media) :

        return constante_pixel - media <= pixel and constante_pixel + media >= pixel

1 answer

6


Your code has two loops to go through each pixel in the "original matrix", which in itself is already a problem for some applications - but then you go through, in two other more internal loops, each pixel again.this is a number of executions of several blocks of your program equal to the square of your pixel number - any algorithm that really needs this, regardless of the optimizations taken, will naturally consume gigantic resources. In a simple VGA 640,480 resolution image this implies 94_371_840_000 - 94 billion executions of the code block. If it were a single multiplication in native code could be done in about 10-20 seconds on a modern CPU, but there are dozens of lines manipulating very complex objects.

But yet, in each passage of these two inner loops, you fully recreate the "segmentedmatriz_" - which indicates that previously you have a flaw in how you wrote this algorithm - you need to actually scroll through all the pixels to each pixel?? (and if necessary, the program is incorrect).

The part of this nested processing, some general tips on numerical programming with Python:

Some general considerations

In theory your code may be correct. But in practice, for code that actually works, it’s far from functional.

Python is a very high-level language - so it’s simple to write code like what you want: the language is not between you and the problem - you read the data from an image, you have it on an object that behaves like a two-dimensional matrix, then you use what you learned in programming and make two loops with the height and width of the image, and you can access the values of each pixel. The equivalent code in C would require several extra, non-trivial steps, including choosing an image library, and so on...

Only it would run about a hundred to a thousand times faster than that code of yours. Possibly even one or two orders of magnitude, because of some specificities there.

So - when we do these exercises on the blackboard, or on a notebook page, it’s okay to think of the image as an array - it’s going to have 5 x 5 elements, 10 x 10 at the most, and the theory looks pretty good.

Its code is incorrect on two very different levels: The first is almost only aesthetic: even for algorithms implemented 100% in Python, including all operations, rarely, almost never, in Python are used loops for with numerical values obtained from the length of objects. That is why the for Python, unlike other languages, specializes in traversing sequences - the contents of sequences, not their items! (In general programming this is thought of as a "for each"). When the index, in addition to the content, is necessary, the construction in general is not for index in range(len(sequencia)): but yes for index, content in enumerate(sequencia): - this second form retrieves two variables at once in each loop passage: the index, and the corresponding content.

Rearranging your bows for something like, plus the style of identation to use 4 spaces instead of 8, and several other little things, could make your code more presentable, maybe 2 times faster. (And a final hint of readability, avoid at all costs a variable called simply "l" - it’s quite difficult to distinguish it from "1" in various contexts - you spend mental energy on that distinction that has better use on other fronts)

But here comes the according to level at which your program is thought wrong -

In the Python language, all are objects - including whole numbers. Which means that the overhead to build an integer number in the language is considerable - the language Runtime has to allocate memory, initialize a data area, maintain a reference count for the number, etc... . Furthermore, Python is a language that allows objects to be "redone" and accepts dynamic operator overload. The implication of this is that whichever operation on a Python object - whether it’s a sum, whether it’s retrieving an element from a list, internally it’s equivalent to calling a function. It can be a Python function, or a highly optimized function in native code, in case you get an element from a list - but still, there will be a context change in the CPU, etc... .

But if that’s how it is, how is Python so popular for manipulation of data and images and etc?

Next: everything that involves processing and repetition in Python must be written in a way that uses as much native code as possible. For matrices larger than "white board use" (including real images), the numpy library, for example.

So, for you to have an idea, this line of code:

 matriz_segmentada = [[0 for x in range(altura)] for y in range(largura)]

It will call the internal function __next__ of the object returned by range(altura) once for each pixel - and for each line of the image, will create a new object of type list. For a 640 image, 480 the interactive prompt lets us calculate the time:

In [57]: altura, largura = 640, 480

In [61]: %time matriz_segmentada = [[0 for x in range(altura)] for y in range(largura)]
CPU times: user 34 ms, sys: 903 µs, total: 34.9 ms
Wall time: 34 ms

34 milliseconds - seems little, but if it were a game, for example, it would be a suffering framerate, at a very low resolution, just to erase the entire screen.

Python "matrix" object example "cute":

If I create a specialized class that keeps all data in a linear list in memory, and allows me to access the individual data by customizing the access operator [] implementing the method __getitem__, That would be considerably better. This would be an intermediate level - and using pure Python might allow a gain of up to 5x in the time of your code. If you can’t "vector" the algorithms you need using Numpy, you can use an approach like this and the extension "cython" that lets you optimize Python code as native code - this could be a gain of about 3 or 4 orders of magnitude.

Example in pure Python, still using normal lists and integers:

class Img:
    def __init__(self, altura, largura):
        self.altura = altura
        self.largura = largura
        self.dados = [0] * largura * altura
    def __getitem__(self, pos):
        return self.dados[pos[0] + pos[1] * self.largura]
    def __setitem__(self, pos, valor):
        self.dados[pos[0] + pos[1] * largura] = valor
    def __repr__(self):
        return f"Img <{self.largura} x {self.altura}>"

This code avoids pixel-to-pixel loops to create the empty array, simply multplicating by the number of pixels a list created with an initial "0" element. The time to create the same object passes to:

In [64]: %time Img(640, 480)
CPU times: user 2.54 ms, sys: 0 ns, total: 2.54 ms
Wall time: 2.59 ms
Out[64]: <__main__.Img at 0x7f35c8f81668>

3 ms - an order of magnitude gain and code that can become much more synthetic than what you used:

In [65]: img = Img(640, 480)

In [66]: img[100, 100] = 255

In [67]: img[99,100]
Out[67]: 0

In [68]: img[100, 100]
Out[68]: 255

Okay, but this is perfumery - unfortunately, despite the elegance that Python allows for this type of objects with custom classes, this code would not have such a great speed gain compared to yours.

Example using Numpy

What you actually do in "production" programs is: create the data as a Numpy matrix - and have optimized functionality to process element-to-element in an internal loop, using native code, the values in each cell of a Numpy matrix can be chosen from among the machine’s native data types. Then the saving of memory and time for creation and manipulation of the data happens to be orders of grandeur:

In [69]: import numpy as np

In [70]: %time img_numpy = np.zeros((640, 480), dtype=np.float64)
CPU times: user 232 µs, sys: 31 µs, total: 263 µs
Wall time: 196 µs

In [71]: img_numpy[100,100]
Out[71]: 0.0

The previous 3ms of my custom class is now 0.2ms (196 µs). Another order of magnitude of gain - another order of magnitude in memory usage, since Python’s "float" numbers being Python objects occupy about 80 bytes each (the integer "0" value of the above examples however is reused - measuring the memory used by these structures is not so trivial). numpy, in this case, uses floating point values of 64bit: 8 bytes per pixel. If your algorithms will only use integers between 0 and 255, just use dtype=np.uint8 above, and memory usage falls another order of magnitude, to 1byte/pixel

But more than the creation time - let’s go to the list-list, specialized class, and matrix in numpy, apply an operation that adds "1" to the value of each pixel. Watch the times:

matrix using list lists:

In [72]: img_lista = [[0 for x in range(altura)] for y in range(largura)]

In [73]: def add_1(img_lista):
    ...:     for x, col in enumerate(img_lista):
    ...:         for y, value in enumerate(img_lista):
    ...:             col[y] += 1
    ...:             

In [74]: %time add_1(img_lista)
CPU times: user 43.5 ms, sys: 3.91 ms, total: 47.4 ms
Wall time: 43.5 ms

In [75]: img_lista[100][100]
Out[75]: 1

Using the specialized class 'Img' above:

In [76]: img = Img(640, 480)

In [77]: %time for index, value in enumerate(img.dados): img.dados[index] += 1
CPU times: user 70.2 ms, sys: 4.72 ms, total: 74.9 ms
Wall time: 71.6 ms

Note that it was actually slower than the list list - because of implementation details of how access to attributes is implemented.

# Usando o numpy:

In [78]: img_numpy = np.zeros((640, 480), dtype=np.float64)

In [79]: %time img_numpy += 1
CPU times: user 1.44 ms, sys: 4.81 ms, total: 6.25 ms
Wall time: 4.88 ms

In [80]: img_numpy[100, 100]
Out[80]: 1.0

Ready - Numpy overloads the operator += for its matrix types (the documentation always calls 'arrays'), so that all loops are executed in native type - the operation that changes all pixels is now executed in less than 5 milliseconds.

this kind of operation of Numpy, where one avoids a for explicit in Python is called "broadcast" in the Numpy document - the ideal is to be able to rewrite as many of its algorithms as possible to make use of it. It’s practically a "super-language" to be learned over the Python language, but it’s worth it.

Virtually all use of Python in the domains of science, data analysis (big data) and machine learning is based on Numpy - so it’s not "just another library" - it’s practically part of the language.

The documentation is on Numpy’s official website - specific amis questions you may have when converting your program can be further clarified in new questions.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.