How does Python treat the "Yield" command internally?

Asked

Viewed 653 times

17

Was reading on command yield from Python, and it seems to me that this command creates a Generator which would be a kind of data list in which the return on value occurs over demand, as if the last "state" of the interaction was somehow "memorized".

To prove this, see this function which returns three letters:

def letras():
    yield 'A'
    yield 'B'
    yield 'C'

Calling the function letras() in a for to obtain the data:

for letra in letras():
    print(letra)

Notice that the exit will be:

A
B
C

Now, if I modify the function letras() to increase the value of v which is a global variable:

def letras():
    global v
    v += 1
    print(v)
    yield 'A'
    yield 'B'
    yield 'C'

and the exit:

1
A
B
C

See that v holds the value 1, this shows that the function letras() did not memorize the status of v, only the values returned in yield, is as if the function had been called only once. Consequently, I still can’t see clearly how the yield, because the behavior of the function seems to have been different than I expected and caused me more confusion about the yield, maybe understanding how Python deals with it internally can help.

Question

Therefore, I would like to know how Python treats the command yield internally? Or which structure or mechanism the yield uses?

  • I believe the language holds where the last return took place and continues the execution from there

2 answers

17


Understanding what’s going on

The yield internally does a lot of things: For starters: the fact of a function had the keyword yield anywhere in her body causes her to be treated differently by Python. The function ceases to be a function and becomes a "Generator Function".

This change is only comparable to functions that are explicitly declared asynchronous with the sequence async def instead of only def.

What happens is more or less easy to understand: when you call a function that has a yield in your body - even if this yield never run - no line of this function is executed immediately. What Python does is create a special object of the type generator that is returned as if it were the "return" of this function.

A "Generator" in turn is an object that will have the special method __next__, (and also send and throw - further down).

When a "Generator" is used on the right side of a command for, the language itself will call the method __next__ once for each repetition of the for.

The first time the __next__ is called, then yes, the function starts to run in the first line - it happens with normal functions - it receives the parameters that were passed in the initial call, and is executed line by line, until you reach the first yield. At that time the execution is "suspended" - the value of all local Generator variables is saved, and the value that was sent by yield is returned as a result of the call to the function __next__. On the next call to __next__, processing does not start again in the first line of the function, but rather at the very point where the yield - processing continues from there, line by line, until a new Yield is found, or a command return (or the end of the function code, which is equivalent, in Python, to a return None).

When Generator comes to an end, instead of returning the value that is in Return - it generates an exception of the type StpoIteration. The command for automatically captures this StopIteration and closes the block of for.

It’s easier to understand if we create a function with Yield inside, and use it in "manual" mode, without the for:

In [275]: def exemplo(): 
     ...:     print("primeira parte") 
     ...:     yield 0 
     ...:     print("segunda parte") 
     ...:     yield 1 
     ...:     print("parte final") 
     ...:     return 2 
     ...:                                                                                                                        

In [276]: type(exemplo)                                                                                                          
Out[276]: function

In [278]: gen = exemplo()                                                                                                        

In [279]: gen, type(gen)                                                                                                         
Out[279]: (<generator object exemplo at 0x7f5533e2a6d8>, generator)

In [281]: gen.__next__()                                                                                                         
primeira parte
Out[281]: 0

In [282]: gen.__next__()                                                                                                         
segunda parte
Out[282]: 1

In [283]: gen.__next__()                                                                                                         
parte final
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-283-d5d004b357fe> in <module>
----> 1 gen.__next__()

StopIteration: 2

Note that nothing was printed after the "278" input - the call exemplo() returns a "Generator", as we can see by the representation and the type, in the entry "279", and the line printing "first part" is only called when we call the __next__ the first time.

Using the same example function in a for, the exit is:

In [284]: for v in exemplo(): 
     ...:     print(v) 
     ...:                                                                                                                        
primeira parte
0
segunda parte
1
parte final

Another information that is legal: the special methods, with two __ prefix and suffix very rarely have to be called directly - in general these methods are called by the language itself. So instead of calling straight the .__next__ in a function, the most common is to use the function next Python and pass Generator as parameter.

Then the command for Python, when used with a "Generator Function" is equivalent to this sequence using while:

In [286]: gen = exemplo()                                                                                                        

In [287]: while True: 
     ...:     try: 
     ...:         v = next(gen) 
     ...:     except StopIteration: 
     ...:         break 
     ...:     print(v) 
     ...:                                                                                                                        
primeira parte
0
segunda parte
1
parte final

(The for Python is smarter than this yet because it works with other types of objects: in addition to detecting generators, it also works with iterables: objects that has the method __iter__, and objects that have the methods __len__ and __getitem__ and optionally keys, together.)

In your question you add global variables to the example Generator Function: a global variable is global and its value will be preserved between consecutive calls to the same Generator or intercalated with other instances of Generator.

How Python Distinguishes a Power Actor from a Power Actor:

As stated above, it is Python’s own compiler that transforms a function into a "Generator Function". The type of object of a function containing a yield remains a function - as can be seen in the "276" output above. What Python does is in the object flags __code__ of a Generator Function, it is marked as such - it makes the behavior of language when it is called completely different.

That is, it is not "easy" to see that a function is a "Generator Function" without looking at its code and seeing the yield there - but with the Python introspection mechanisms, we can see that the name flag "GENERATOR" is set in the attribute .__code__.co_flags function. The value of this flag can be seen in the module dis:

In [288]: def exemplo(): 
     ...:     yield 
     ...:                                                                                                                        

In [289]: def contra_exemplo(): 
     ...:     return None 
     ...:                                                                                                                        

In [290]: import dis                                                                                                             

In [291]: dis.COMPILER_FLAG_NAMES                                                                                                
Out[291]: 
{1: 'OPTIMIZED',
 2: 'NEWLOCALS',
 4: 'VARARGS',
 8: 'VARKEYWORDS',
 16: 'NESTED',
 32: 'GENERATOR',
 64: 'NOFREE',
 128: 'COROUTINE',
 256: 'ITERABLE_COROUTINE',
 512: 'ASYNC_GENERATOR'}

In [292]: bool(exemplo.__code__.co_flags & 32)                                                                                   
Out[292]: True

In [293]: bool(contra_exemplo.__code__.co_flags & 32)                                                                            
Out[293]: False

Internal details

Note that if you create more than one Enerator from the same "Enerator Function", and use them interchangeably, each will have its own local variables - they don’t mix:

In [294]: def exemplo3(): 
     ...:     counter = 0 
     ...:     yield counter 
     ...:     counter += 1 
     ...:     yield counter 
     ...:                                                                                                                        

In [295]: gen1 = exemplo3()                                                                                                      

In [296]: gen2 = exemplo3()                                                                                                      

In [297]: next(gen1)                                                                                                             
Out[297]: 0

In [298]: next(gen2)                                                                                                             
Out[298]: 0

In [299]: next(gen2)                                                                                                             
Out[299]: 1

In [300]: next(gen1)                                                                                                             
Out[300]: 1

Where these local variables are stored then? Whenever we run a block of code in Python, be it a normal function, be it a Generator, be it the body of a module or the body of a class, Python creates an object of the type Frame. The language exposes these Frames as very normal Python objects - and you can, within them find the local and global variables of any block of running code. In a program that does not use generators or asynchronous functions, a new type object Frame is created each time a function is called - and the most recent frame always has a reference to the previous one. This creates a "stack" - which we call "call stack" in Python. These Frame objects are not very small or efficient to create, so we don’t use many recursive functions in Python, except in didactic code or where it really is the best solution. One of the attributes of a frame is the .f_back: a direct reference to the previous frame - and another is the f_locals which is a dictionary that mirrors the local variables of the running code (the f_locals however only works for reading these variables, not for writing their values).

A recursive function with some prints can show normal use of frames, without generators:

In [306]: import sys                                                                                                             

In [307]: def exemplo4(count): 
     ...:     if count < 4: 
     ...:         print("entrando") 
     ...:         exemplo4(count + 1) 
     ...:         print("saindo") 
     ...:     else: 
     ...:         print(f"count: {count}") 
     ...:         frame = sys._getframe() 
     ...:         frame_count = count 
     ...:         while frame_count: 
     ...:             print(frame, frame.f_locals["count"]) 
     ...:             frame = frame.f_back 
     ...:             frame_count -= 1 
     ...:              
     ...:                                                                                                                        

In [308]: exemplo4(1)                                                                                                            
entrando
entrando
entrando
count: 4
<frame at 0x564291236028, file '<ipython-input-307-165f77ce3bd1>', line 11, code exemplo4> 4
<frame at 0x564291308638, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 3
<frame at 0x56429132ead8, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 2
<frame at 0x5642910e4ff8, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 1
saindo
saindo
saindo

When a Generator is paused with the yield, the Frame of its execution comes out of that stack - the Frame of the top of the stack comes back to be that of the function that called the __next__. The Generator Frame is then stored in the attribute .gi_frame of the Generator itself. The attribute f_locals can inspect the value of the variables within it at the time when the yield was executed:

In [319]: def exemplo5(): 
     ...:     v = 10 
     ...:     yield v 
     ...:     v += 10 
     ...:     yield v 
     ...:                                                                                                                        

In [320]: gen = exemplo5()                                                                                                       

In [321]: gen.gi_frame.f_locals["v"]                                                                                             
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-321-645eee9080b0> in <module>
----> 1 gen.gi_frame.f_locals["v"]

KeyError: 'v'

In [322]: next(gen)                                                                                                              
Out[322]: 10

In [323]: gen.gi_frame.f_locals["v"]                                                                                             
Out[323]: 10

In [324]: next(gen)                                                                                                              
Out[324]: 20

In [325]: gen.gi_frame.f_locals["v"]                                                                                             
Out[325]: 20

Emulating a Generator with a class:

Nothing prevents any class in Python from behaving exactly like a Enerator. In this case, the internal variables should be stored, between one iteration and another, as an instance attribute - while Python saves the local variables by storing the execution frame.

To do this, just write a class that has the special method __next__ explicitly, and the method __iter__ to be executed before from the first call to __next__ for for (class can even be separated into 2 stages - the object returned by __iter__ can be of another class or other instance, and implement only the __next__). Note that the values a Generator returns using yield must be returned with a return common to this function.

Then the Python code to generate the squares of the numbers from 0 to n in Python can be written as a "Generator Function" like this:

def squares(n):
    for i in range(n):
        yield i ** 2

or as such a class:

class Squares:
    def __init__(self, n):
        self.n = n
        self.i = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.i >= self.n:
            raise StopIteration()
        result = self.i ** 2
        self.i += 1
        return result

Using this class in interactive mode:

In [331]: for s in Squares(4): 
 ...:     print(s) 
 ...:                                                                                                                        
0
1
4
9

These classes are not called "generators" - this name is only used for objects created when calling a function that contains a yield (such "Generator functions"). This type of class is called by the more generic name of iterável - any object that can produce an iterator - an iterator, in turn, is the most generic name for any object that has the method __next__.

Other generators methods and "advanced information":

Besides the method __next__, generators also have methods .send and .throw - these methods are never automatically called by for. Instead, they can be used when using a "manual" Generator to send values for a Generator that is already running, or for make a mistake of a particular type with the .throw - in this case, the throw argument is an object of the type exception - and it is caused at the point where Yield is.

These features are not used explicitly in "day-to-day" code, and were added because with them, Python generators can now be used as "co-routines". This is different from normal functions that are always "subroutines". Co-routines can be called in parallel in a collaborative way by a specialized system.

Another associated expression is yield from - it makes a Generator can "make Yield" from another generator, internal, without the value ever being processed by itself - this allows, for example, recursive generators.

I have a "toy" project in which I use Generator functions as "co-routines" without being asynchronous programming - it simulates a "stoned" effect of "Matrix" on the terminal. It only works on a terminal that has ANSI codes enabled, which allows special print sequences to position the cursor and change the color of the letters - which doesn’t happen yet in Windows. The project is here and works well on Linux and Mac: https://github.com/jsbueno/terminal_matrix/blob/master/matrix.py (and to activate ANSI codes in the Windows terminal, see something here: https://stackoverflow.com/questions/16755142/how-to-make-win32-console-recognize-ansi-vt100-escape-sequences)

The combination of resources provided by the methods .send, .throw and by the expression yield from is what was used to allow the use of asynchronous programming in Python: that is, many functions that are executed in a single thread, but in parallel, passing the execution of the program to another "co-routine" every time a call has to access an external resource of the operating system that will take time to complete (a network request, read data from a file, a pause of type time.sleep, etc....).

The asynchronous programming, its syntax and its use are a topics that can tie knots in the head even of advanced programmers - obviously not fit to describe everything in this answer - but it is worth mentioning that until version 3.4 of Python, when the module was introduced asyncio in the language, the way to do asynchronous programming in Python was with the use of generators and yield from, and the execution, pause and continuation of the co-routines was (and still is) controlled by the event loop of the asyncio using the methods __next__, send and throw. Starting with Python 3.5, a separate syntax was introduced for asynchronous functions - the async def, await, and others like async for, etc... - but the internal mechanisms that Python uses are the very same ones that are used for generators.

In short:

A function that contains a yield or a yield from is a "Generator Function". When it is called it is not executed immediately- it returns an object of type "Generator". Objects of the "Energy" type have a method __next__ that when called executes the code of the original function until finding a yield - at that point the function is "paused" - its execution "where it is" and the execution of the program returns to the point where the method __next__ was called - directly, or implicitly with the command for. When the __next__ is called again, Generator is "de-paused" - and execution continues at the point of yield. If instead of the method .__next__ the method .send of Generator is called, the value passed as parameter of the .send is the value that the yield assumes within the Generator code (otherwise the yield vale None). There is also the method .throw: the parameter for the same must be an exception - Python makes that exception happen at the point where the yield

Other questions with more information about how Yield works:

Python asynchronous generators

Python reserved word Yield

What is Yield for?

5

In fact when you have one yield in the function it does not return that value, it returns a generator. This generator is an object that stores the state necessary for its control, so it knows where it has stopped and so it can continue the next time it is called.

The for has its own mechanism that manages it. Calling in hand needs to take care of the access to the generator in hand. The function next() is used to manipulate the generator.

Execute this code:

def letras():
    yield 'A'
    yield 'B'
    yield 'C'
gerador = letras()
print(next(gerador))
print(next(gerador))
print(next(gerador))
print(next(letras()))
print(next(letras()))

I put in the Github for future reference.

This variable gerador is to guard the generator and carrying the execution status, so every time you invoke the function it will look where the generator is and will increment 1 in its internal state. How it does this is implementation detail, but basically it stores a data list and a counter. This list can be a control of the lines to be executed. In standard Python should change little because there is already an internal VM mechanism that controls the execution stack, so just encapsulate it in a generator control object.

Note that if you call the function without saving the state it always starts with a new generator.

You don’t see in the for but a generator is created inside it that goes from beginning to end, the for is an abstraction for the project pattern called Iterator. And the function is built to create the generating object, because it’s an abstraction you don’t see the state being created.

Try calling 4 times the next(gerador) there instead of three. The iterator will make an exception because it has no more data to evaluate.

Objects do not catch iterators implicitly, so it has a method that returns the iterator, and as this is very common it has a pattern to catch this iterator. Objects that can be eternal have the method __iter__() and the iterator object must have its own function implementation __next__(). Python native objects already have this.

In the case of a function there is an internal object that is capable of being iterable that is created to handle the progress of the function and where it is, but it is no different than having the state and the ability to deliver the iterator and who is the next item.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.