Understanding what’s going on
The yield
internally does a lot of things:
For starters: the fact of a function had the keyword yield
anywhere in her body causes her to be treated differently by Python. The function ceases to be a function and becomes a "Generator Function".
This change is only comparable to functions that are explicitly declared asynchronous with the sequence async def
instead of only def
.
What happens is more or less easy to understand: when you call a function that has a yield
in your body - even if this yield
never run - no line of this function is executed immediately. What Python does is create a special object of the type generator
that is returned as if it were the "return" of this function.
A "Generator" in turn is an object that will have the special method __next__
, (and also send
and throw
- further down).
When a "Generator" is used on the right side of a command for
, the language itself will call the method __next__
once for each repetition of the for
.
The first time the __next__
is called, then yes, the function starts to run in the first line - it happens with normal functions - it receives the parameters that were passed in the initial call, and is executed line by line, until you reach the first yield
. At that time the execution is "suspended" - the value of all local Generator variables is saved, and the value that was sent by yield
is returned as a result of the call to the function __next__
. On the next call to __next__
, processing does not start again in the first line of the function, but rather at the very point where the yield
-
processing continues from there, line by line, until a new Yield is found, or a command return
(or the end of the function code, which is equivalent, in Python, to a return None
).
When Generator comes to an end, instead of returning the value that is in Return - it generates an exception of the type StpoIteration
. The command for
automatically captures this StopIteration
and closes the block of for
.
It’s easier to understand if we create a function with Yield inside, and use it in "manual" mode, without the for
:
In [275]: def exemplo():
...: print("primeira parte")
...: yield 0
...: print("segunda parte")
...: yield 1
...: print("parte final")
...: return 2
...:
In [276]: type(exemplo)
Out[276]: function
In [278]: gen = exemplo()
In [279]: gen, type(gen)
Out[279]: (<generator object exemplo at 0x7f5533e2a6d8>, generator)
In [281]: gen.__next__()
primeira parte
Out[281]: 0
In [282]: gen.__next__()
segunda parte
Out[282]: 1
In [283]: gen.__next__()
parte final
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-283-d5d004b357fe> in <module>
----> 1 gen.__next__()
StopIteration: 2
Note that nothing was printed after the "278" input - the call exemplo()
returns a "Generator", as we can see by the representation and the type, in the entry "279", and the line printing "first part" is only called when we call the __next__
the first time.
Using the same example function in a for
, the exit is:
In [284]: for v in exemplo():
...: print(v)
...:
primeira parte
0
segunda parte
1
parte final
Another information that is legal: the special methods, with two __
prefix and suffix very rarely have to be called directly - in general these methods are called by the language itself. So instead of calling straight the .__next__
in a function, the most common is to use the function next
Python and pass Generator as parameter.
Then the command for
Python, when used with a "Generator Function" is equivalent to this sequence using while
:
In [286]: gen = exemplo()
In [287]: while True:
...: try:
...: v = next(gen)
...: except StopIteration:
...: break
...: print(v)
...:
primeira parte
0
segunda parte
1
parte final
(The for
Python is smarter than this yet because it works with other types of objects: in addition to detecting generators
, it also works with iterables
: objects that has the method __iter__
, and objects that have the methods __len__
and __getitem__
and optionally keys
, together.)
In your question you add global variables to the example Generator Function: a global variable is global and its value will be preserved between consecutive calls to the same Generator or intercalated with other instances of Generator.
How Python Distinguishes a Power Actor from a Power Actor:
As stated above, it is Python’s own compiler that transforms a function into a "Generator Function". The type of object of a function containing a yield
remains a function
- as can be seen in the "276" output above. What Python does is in the object flags __code__
of a Generator Function, it is marked as such - it makes the behavior of language when it is called completely different.
That is, it is not "easy" to see that a function is a "Generator Function" without looking at its code and seeing the yield
there - but with the Python introspection mechanisms, we can see that the name flag "GENERATOR" is set in the attribute .__code__.co_flags
function. The value of this flag can be seen in the module dis
:
In [288]: def exemplo():
...: yield
...:
In [289]: def contra_exemplo():
...: return None
...:
In [290]: import dis
In [291]: dis.COMPILER_FLAG_NAMES
Out[291]:
{1: 'OPTIMIZED',
2: 'NEWLOCALS',
4: 'VARARGS',
8: 'VARKEYWORDS',
16: 'NESTED',
32: 'GENERATOR',
64: 'NOFREE',
128: 'COROUTINE',
256: 'ITERABLE_COROUTINE',
512: 'ASYNC_GENERATOR'}
In [292]: bool(exemplo.__code__.co_flags & 32)
Out[292]: True
In [293]: bool(contra_exemplo.__code__.co_flags & 32)
Out[293]: False
Internal details
Note that if you create more than one Enerator from the same "Enerator Function", and use them interchangeably, each will have its own local variables - they don’t mix:
In [294]: def exemplo3():
...: counter = 0
...: yield counter
...: counter += 1
...: yield counter
...:
In [295]: gen1 = exemplo3()
In [296]: gen2 = exemplo3()
In [297]: next(gen1)
Out[297]: 0
In [298]: next(gen2)
Out[298]: 0
In [299]: next(gen2)
Out[299]: 1
In [300]: next(gen1)
Out[300]: 1
Where these local variables are stored then?
Whenever we run a block of code in Python, be it a normal function, be it a Generator, be it the body of a module or the body of a class, Python creates an object of the type Frame
. The language exposes these Frame
s as very normal Python objects - and you can, within them find the local and global variables of any block of running code. In a program that does not use generators or asynchronous functions, a new type object Frame
is created each time a function is called - and the most recent frame always has a reference to the previous one. This creates a "stack" - which we call "call stack" in Python. These Frame objects are not very small or efficient to create, so we don’t use many recursive functions in Python, except in didactic code or where it really is the best solution. One of the attributes of a frame is the .f_back
: a direct reference to the previous frame - and another is the f_locals
which is a dictionary that mirrors the local variables of the running code (the f_locals
however only works for reading these variables, not for writing their values).
A recursive function with some prints can show normal use of frames, without generators:
In [306]: import sys
In [307]: def exemplo4(count):
...: if count < 4:
...: print("entrando")
...: exemplo4(count + 1)
...: print("saindo")
...: else:
...: print(f"count: {count}")
...: frame = sys._getframe()
...: frame_count = count
...: while frame_count:
...: print(frame, frame.f_locals["count"])
...: frame = frame.f_back
...: frame_count -= 1
...:
...:
In [308]: exemplo4(1)
entrando
entrando
entrando
count: 4
<frame at 0x564291236028, file '<ipython-input-307-165f77ce3bd1>', line 11, code exemplo4> 4
<frame at 0x564291308638, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 3
<frame at 0x56429132ead8, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 2
<frame at 0x5642910e4ff8, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 1
saindo
saindo
saindo
When a Generator is paused with the yield
, the Frame of its execution comes out of that stack - the Frame of the top of the stack comes back to be that of the function that called the __next__
. The Generator Frame is then stored in the attribute .gi_frame
of the Generator itself. The attribute f_locals
can inspect the value of the variables within it at the time when the yield
was executed:
In [319]: def exemplo5():
...: v = 10
...: yield v
...: v += 10
...: yield v
...:
In [320]: gen = exemplo5()
In [321]: gen.gi_frame.f_locals["v"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-321-645eee9080b0> in <module>
----> 1 gen.gi_frame.f_locals["v"]
KeyError: 'v'
In [322]: next(gen)
Out[322]: 10
In [323]: gen.gi_frame.f_locals["v"]
Out[323]: 10
In [324]: next(gen)
Out[324]: 20
In [325]: gen.gi_frame.f_locals["v"]
Out[325]: 20
Emulating a Generator with a class:
Nothing prevents any class in Python from behaving exactly like a Enerator. In this case, the internal variables should be stored, between one iteration and another, as an instance attribute - while Python saves the local variables by storing the execution frame.
To do this, just write a class that has the special method __next__
explicitly, and the method __iter__
to be executed before from the first call to __next__
for for
(class can even be separated into 2 stages - the object returned by __iter__
can be of another class or other instance, and implement only the __next__
). Note that the values a Generator returns using yield
must be returned with a return
common to this function.
Then the Python code to generate the squares of the numbers from 0 to n in Python can be written as a "Generator Function" like this:
def squares(n):
for i in range(n):
yield i ** 2
or as such a class:
class Squares:
def __init__(self, n):
self.n = n
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i >= self.n:
raise StopIteration()
result = self.i ** 2
self.i += 1
return result
Using this class in interactive mode:
In [331]: for s in Squares(4):
...: print(s)
...:
0
1
4
9
These classes are not called "generators" - this name is only used for objects created when calling a function that contains a yield
(such "Generator functions"). This type of class is called by the more generic name of iterável
- any object that can produce an iterator - an iterator, in turn, is the most generic name for any object that has the method __next__
.
Other generators methods and "advanced information":
Besides the method __next__
, generators also have methods .send
and .throw
- these methods are never automatically called by for
. Instead, they can be used when using a "manual" Generator to send values for a Generator that is already running, or for make a mistake of a particular type with the .throw
- in this case, the throw argument is an object of the type exception - and it is caused at the point where Yield is.
These features are not used explicitly in "day-to-day" code, and were added because with them, Python generators can now be used as "co-routines". This is different from normal functions that are always "subroutines". Co-routines can be called in parallel in a collaborative way by a specialized system.
Another associated expression is yield from
- it makes a Generator can "make Yield" from another generator, internal, without the value ever being processed by itself - this allows, for example, recursive generators.
I have a "toy" project in which I use Generator functions as "co-routines" without being asynchronous programming - it simulates a "stoned" effect of "Matrix" on the terminal. It only works on a terminal that has ANSI codes enabled, which allows special print sequences to position the cursor and change the color of the letters - which doesn’t happen yet in Windows. The project is here and works well on Linux and Mac: https://github.com/jsbueno/terminal_matrix/blob/master/matrix.py (and to activate ANSI codes in the Windows terminal, see something here: https://stackoverflow.com/questions/16755142/how-to-make-win32-console-recognize-ansi-vt100-escape-sequences)
The combination of resources provided by the methods .send
, .throw
and by the expression yield from
is what was used to allow the use of asynchronous programming in Python: that is, many functions that are executed in a single thread, but in parallel, passing the execution of the program to another "co-routine" every time a call has to access an external resource of the operating system that will take time to complete (a network request, read data from a file, a pause of type time.sleep
, etc....).
The asynchronous programming, its syntax and its use are a topics that can tie knots in the head even of advanced programmers - obviously not fit to describe everything in this answer - but it is worth mentioning that until version 3.4 of Python, when the module was introduced asyncio
in the language, the way to do asynchronous programming in Python was with the use of generators and yield from
, and the execution, pause and continuation of the co-routines was (and still is) controlled by the event loop of the asyncio
using the methods __next__
, send
and throw
. Starting with Python 3.5, a separate syntax was introduced for asynchronous functions - the async def
, await
, and others like async for
, etc... - but the internal mechanisms that Python uses are the very same ones that are used for generators.
In short:
A function that contains a yield
or a yield from
is a "Generator Function". When it is called it is not executed immediately- it returns an object of type "Generator". Objects of the "Energy" type have a method __next__
that when called executes the code of the original function until finding a yield
- at that point the function is "paused" - its execution "where it is" and the execution of the program returns to the point where the method __next__
was called - directly, or implicitly with the command for
. When the __next__
is called again, Generator is "de-paused" - and execution continues at the point of yield
. If instead of the method .__next__
the method .send
of Generator is called, the value passed as parameter of the .send
is the value that the yield
assumes within the Generator code (otherwise the yield
vale None
). There is also the method .throw
: the parameter for the same must be an exception - Python makes that exception happen at the point where the yield
Other questions with more information about how Yield works:
Python asynchronous generators
Python reserved word Yield
What is Yield for?
I believe the language holds where the last return took place and continues the execution from there
– Costamilam