This is a typical case where it’s cool to use Python’s "generators" feature -
generators are characterized by responding to the "iterator Protocol" - and as such, objects that can be used in a command for
python.
One of the ways to create generators is to write functions that has the keyword yield
in the body. The expression used in Yield is passed as value for an interaction of for
. In this case, you can create a Generator that takes the large list as a parameter, and generates as a result slices of 100 elements:
def slice(biglist, slice_size=100):
for i in range(0, len(biglist), slice_size):
yield biglist[i: i + slice_size]
and to use this:
for list_slice in slice(biglist):
# código para usar a sublista com 100 elementos
...
What are the generators:
The idea is as follows: a normal function is always called, receives parameters, creates its local variables, executes its code (which may or may not call other functions), and returns a value - when it finds the command return
(Functions without a return
in Python return None
implicitly).
At the time of a return
, all local variables created by the function are destroyed. Including the parameters you received. If it is called again, the execution goes back to the first line of the function, and it has to redo all the calculations it has done again. In terms of programming, a function is said to have no "state" - that is, every time it is called with the same parameters, it will redo everything, and return the same results (of course, if it does not itself fire other calls that return variable factors, such as file readings, internet acquisition, or data entered by the user (with input
)).
When we talk about object-oriented programming, one of the big changes that happens is that objects can have attributes - that is, between a call from one method and another from the same object, their state can change.
So in this case, for example - we want to have something that "remembers the big, full list" and "remembers" which was the last "slice" of the list that was used, and which can generate the next slice, and signal when the big list ends.
In normal object orientation, without the improvements of the "generators" that Python incorporates, this could be written like this:
Sentinela = None
class Fatia:
def __init__(self, listagrande, tam_fatia=100):
self.listagrande = listagrande
self.tam_fatia = tam_fatia
self.indice = 0
def proximo(self):
if self.indice >= len(self.listagrande):
return Sentinela
resultado = lista_grande[self.indice: self.indice + self.tam_fatia]
self.indice += self.tam_fatia
return resultado
And this class could be used to slice the list, in a way similar to what I did in the first example, but in a more "manual":
fatiador = Fatia(listagrande)
while True:
pedaco = fatiador.proximo()
if pedaco is Sentinela:
# sai do while
break
# Codigo para usar a essa sublista
....
Now, the for
in Python, being a command that already knows how to traverse sequences or iterators, it knows how to "make use" of objects similar to those defined in this class. Every time we make a for numero in sequencia:
in Python, it picks up the object in the expression after the keyword in
(in this case, the variable "sequence"), and calls the method __iter__
of that object. (If the object does not have a method __iter__
the command for
has a "plane B" using the object length and numeric indices, but it is not the case now).
The value returned by this method __iter__
, itself has to be an object that has the method __next__
. If you do, then the command for
will call this method __next__
once for each iteration - and the value returned by the method is used in the for
. When the __next__
no more objects to return, it raises an exception of the type StopIteration
. Unlike the exceptions we are more used to seeing, like Valueerror and Typeerror, Stopiteration does not indicate an error - and yes, it is a sign rightly used by for
to know that that iterator has no results anymore. The command for
then it’s over, and the execution continues on the first line after the for
.
So, for example, we can adapt the above class to work with for
, only one method __iter__
and renaming the method proximo
:
class Fatia:
def __init__(self, listagrande, tam_fatia=100):
self.listagrande = listagrande
self.tam_fatia = tam_fatia
self.indice = 0
def __iter__(self):e
# Como essa propria classe contem
# um metodo __next__ apropriado,
# basta retornar o proprio objeto
return self
def __next__(self):
if self.indice >= len(self.listagrande):
raise StopIteration()
resultado = lista_grande[self.indice: self.indice + self.tam_fatia]
self.indice += self.tam_fatia
return resultado
Ready, with this simple change - before was returned a sentinel variable, whose value was compared in a if
that had to be written manually, and now it causes the Stopiteration exception. Objects of this class can be used as in the first example of the answer:
for list_slice in Fatia(biglist):
# código para usar a sublista com 100 elementos
...
So the language maintainers, noting that this was a common usage standard, created the keyword yield
. Whenever it appears in the body of a function, the function ceases to be a common function, and becomes what we call "generator": internally, it functions like the class above, and Python - when we call such a function, the language already executes the equivalent of __iter__
, but does not execute a line within the function, and yes, returns an object that has the method __next__
. When we call the __next__
(or rather, the command for
fa this call internally). there yes, the code of the body of the function is executed, up to the point where it finds the first yield
. At this point, the value given to yield
is returned as the result of the call to __next__
. When the __next__
is called next time, the function c continues to run on the line following the yield
; all local variables are "remembered" - Yield acts as a "pause" in the function. This makes internal code much more efficient than a so-called "real" function like the one that happens in examples with classes, where local variables are created, etc...
So, revisiting our generator to better understand:
def slice(biglist, slice_size=100):
for i in range(0, len(biglist), slice_size):
yield biglist[i: i + slice_size]
when Python finds for fatia in slice(minhalista):
magic happens - Python first creates a "Generator" object with the parameters passed, and then calls the __next__
inside - that executes the line with the for
within the function slice
- variable "i" is initialized, etc... on the next line, the slice of the list from "i" to "i + 100" is returned, and is used "out of function", in the body of the for
external. The local function variables function as if they were attributes of a class, preserving the "internal state" of the generator. (with the notable difference that, unlike object attributes in Python, these internal variables are not public, and can even be read, but cannot be written by code outside the generating function).
You can use the generator "manually" without using the for
as follows:
fatiador = slice(biglist)
while True:
try:
fatia = next(fatiador)
except StopIteration:
break
# código para lidar com a fatia aqui
...
But it worked and I thought the function was beautiful, but I don’t understand how it works, I only have a few months of programming, I’ll give a study until I understand. Unfortunately I can’t upvote for not having 15 points.Thank you very much indeed.
– Yuri Pastore Aranha
I updated with more information on how generators work
– jsbueno
Thank you very much, I managed to understand the operation and having better performance I will always use.
– Yuri Pastore Aranha