break list by python size

Asked

Viewed 580 times

2

Today I came across the need to break a list of over 100 values in a list of lists with a maximum of 100 values.

I researched a lot, and in the end I put together the function:

remove_list=[[]]
    n=0
    for i in remove_alarms:
        remove_list[n].append(i)
        if len(remove_list[n])>99:
            n=n+1
            remove_list.append([])

In case I’m getting a list of alarm names that should be removed from Amazon and passed as parameter to the api that processes at most 100 values.

The point is that after taking a long time left this language wanted some hint of how to improve or use some built-in Python function.

1 answer

4


This is a typical case where it’s cool to use Python’s "generators" feature -

generators are characterized by responding to the "iterator Protocol" - and as such, objects that can be used in a command for python.

One of the ways to create generators is to write functions that has the keyword yield in the body. The expression used in Yield is passed as value for an interaction of for. In this case, you can create a Generator that takes the large list as a parameter, and generates as a result slices of 100 elements:

def slice(biglist, slice_size=100):
    for i in range(0, len(biglist), slice_size):
        yield biglist[i: i + slice_size]

and to use this:

for list_slice in slice(biglist):
    # código para usar a sublista com 100 elementos
    ...

What are the generators:

The idea is as follows: a normal function is always called, receives parameters, creates its local variables, executes its code (which may or may not call other functions), and returns a value - when it finds the command return (Functions without a return in Python return None implicitly).

At the time of a return, all local variables created by the function are destroyed. Including the parameters you received. If it is called again, the execution goes back to the first line of the function, and it has to redo all the calculations it has done again. In terms of programming, a function is said to have no "state" - that is, every time it is called with the same parameters, it will redo everything, and return the same results (of course, if it does not itself fire other calls that return variable factors, such as file readings, internet acquisition, or data entered by the user (with input)).

When we talk about object-oriented programming, one of the big changes that happens is that objects can have attributes - that is, between a call from one method and another from the same object, their state can change.

So in this case, for example - we want to have something that "remembers the big, full list" and "remembers" which was the last "slice" of the list that was used, and which can generate the next slice, and signal when the big list ends.

In normal object orientation, without the improvements of the "generators" that Python incorporates, this could be written like this:

Sentinela = None

class Fatia:
    def __init__(self, listagrande, tam_fatia=100):
         self.listagrande = listagrande
         self.tam_fatia = tam_fatia
         self.indice = 0

    def proximo(self):
         if self.indice >= len(self.listagrande):
              return Sentinela
         resultado = lista_grande[self.indice: self.indice + self.tam_fatia]
         self.indice += self.tam_fatia
         return resultado

And this class could be used to slice the list, in a way similar to what I did in the first example, but in a more "manual":

fatiador = Fatia(listagrande)
while True:
     pedaco = fatiador.proximo()
     if pedaco is Sentinela:
         # sai do while
         break
     # Codigo para usar a essa sublista
     ....

Now, the for in Python, being a command that already knows how to traverse sequences or iterators, it knows how to "make use" of objects similar to those defined in this class. Every time we make a for numero in sequencia: in Python, it picks up the object in the expression after the keyword in (in this case, the variable "sequence"), and calls the method __iter__ of that object. (If the object does not have a method __iter__ the command for has a "plane B" using the object length and numeric indices, but it is not the case now).

The value returned by this method __iter__, itself has to be an object that has the method __next__. If you do, then the command for will call this method __next__ once for each iteration - and the value returned by the method is used in the for. When the __next__ no more objects to return, it raises an exception of the type StopIteration. Unlike the exceptions we are more used to seeing, like Valueerror and Typeerror, Stopiteration does not indicate an error - and yes, it is a sign rightly used by for to know that that iterator has no results anymore. The command for then it’s over, and the execution continues on the first line after the for.

So, for example, we can adapt the above class to work with for, only one method __iter__ and renaming the method proximo:

class Fatia:
    def __init__(self, listagrande, tam_fatia=100):
         self.listagrande = listagrande
         self.tam_fatia = tam_fatia
         self.indice = 0

    def __iter__(self):e
        # Como essa propria classe contem
        # um metodo __next__ apropriado, 
        # basta retornar o proprio objeto
        return self

    def __next__(self):
         if self.indice >= len(self.listagrande):
              raise StopIteration()
         resultado = lista_grande[self.indice: self.indice + self.tam_fatia]
         self.indice += self.tam_fatia
         return resultado

Ready, with this simple change - before was returned a sentinel variable, whose value was compared in a if that had to be written manually, and now it causes the Stopiteration exception. Objects of this class can be used as in the first example of the answer:

for list_slice in Fatia(biglist):
    # código para usar a sublista com 100 elementos
    ...

So the language maintainers, noting that this was a common usage standard, created the keyword yield. Whenever it appears in the body of a function, the function ceases to be a common function, and becomes what we call "generator": internally, it functions like the class above, and Python - when we call such a function, the language already executes the equivalent of __iter__, but does not execute a line within the function, and yes, returns an object that has the method __next__. When we call the __next__ (or rather, the command for fa this call internally). there yes, the code of the body of the function is executed, up to the point where it finds the first yield. At this point, the value given to yield is returned as the result of the call to __next__. When the __next__ is called next time, the function c continues to run on the line following the yield; all local variables are "remembered" - Yield acts as a "pause" in the function. This makes internal code much more efficient than a so-called "real" function like the one that happens in examples with classes, where local variables are created, etc...

So, revisiting our generator to better understand:

def slice(biglist, slice_size=100):
    for i in range(0, len(biglist), slice_size):
        yield biglist[i: i + slice_size]

when Python finds for fatia in slice(minhalista): magic happens - Python first creates a "Generator" object with the parameters passed, and then calls the __next__ inside - that executes the line with the for within the function slice - variable "i" is initialized, etc... on the next line, the slice of the list from "i" to "i + 100" is returned, and is used "out of function", in the body of the for external. The local function variables function as if they were attributes of a class, preserving the "internal state" of the generator. (with the notable difference that, unlike object attributes in Python, these internal variables are not public, and can even be read, but cannot be written by code outside the generating function).

You can use the generator "manually" without using the for as follows:

fatiador = slice(biglist)
while True:
    try:
         fatia = next(fatiador)
    except StopIteration:
         break
    # código para lidar com a fatia aqui
    ...
  • But it worked and I thought the function was beautiful, but I don’t understand how it works, I only have a few months of programming, I’ll give a study until I understand. Unfortunately I can’t upvote for not having 15 points.Thank you very much indeed.

  • I updated with more information on how generators work

  • Thank you very much, I managed to understand the operation and having better performance I will always use.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.