What is the actual implementation of "in" in Python?

Asked

Viewed 208 times

6

This week I found myself wondering what the actual implementation of "in" in Python is. To try to answer, I went to look the official documentation about his details, and only say that:

For container types such as list, tuple, set, frozenset, Dict, or Collections.deque, the Expression x in y is equivalent to any(x is e or x == e for e in y).

Which I particularly disagree with (if I’m wrong, why am I?), why the ideal would be to stop the execution at the time when the first comparison is valid. That is, I believe that this would be the real implementation:

def _in(a, b):
    for e in b:
        if e == a or e is a:
            return True
    return False

To test this theory of mine, I did a test using function time.time() with a BIG list (5 MILLION integers) and put the element I want to find last. And this code:

import time

def generate_huge_list():
    _list = []
    for i in range(5000):
        _list.extend([i,]*1000)

    _list.append(-10)
    return _list

def _in_for(a, b):
    for e in b:
        if e == a or e is a:
            return True
    return False

def _using_in(x, list_of_integers):
    return x in list_of_integers

def _using_any(x, list_of_integers):
    return any(x is e or x == e for e in list_of_integers)

list_of_integers = generate_huge_list()

x = -10

old_time = time.time()
does_it_exist = _using_in(x, list_of_integers)
elapsed_time = time.time() - old_time
print( "Usando o 'in' do Python, demorou %f e retornou %s" % (elapsed_time, does_it_exist))

old_time = time.time()
does_it_exist = _in_for(x, list_of_integers)
elapsed_time = time.time() - old_time
print( "Usando minha implementação, demorou %f e retornou %s" % (elapsed_time, does_it_exist))

old_time = time.time()
does_it_exist = _using_any(x, list_of_integers)
elapsed_time = time.time() - old_time
print( "Usando a implementação da documentação, demorou %f e retornou %s" % (elapsed_time, does_it_exist))

Returned that to me:

Number of items on the list: 5000001
Using the in from Python, it took 0.131924 and returned True
Using my implementation, it took 0.457717 and returned True
Using the documentation implementation, it took 0.949417 and returned True

It scares me the speed discrepancy, being the in incredibly faster than the rest and my function in the second place (which possibly proves my theory). I decided to look in the source code and by Magic method of in, found this function:

static int
list_contains(PyListObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

By the ease of reading, I concluded that this should really be the function. But still I was doubtful:

Why is it so much faster? Just why is it written in C? If so, there are some other better known implementations that would be best used if they were the original?

  • 1

    I believe that the interpreter already has the bytecodes optimized for functions and builtin operators of Python, by the way, I believe that this code is in Cpython. https://en.wikipedia.org/wiki/CPython

  • 1

    "the ideal would be to stop the execution at the time when the first comparison is valid" - But the any Doesn’t it stop when you find an element? At least the documentation says that it stops at: https://docs.python.org/3/library/functions.html#any - And to test the runtime, it is best to use the module timeit: https://docs.python.org/3.7/library/timeit.html

  • So why does it take twice as long??

  • Oh yes, it should be because I go twice (to assemble the boolean list and then go through).

  • 1

    (x is e or x == e for e in list_of_integers) does not go through the entire list, only creates a Generator, that is Lazy (does not return all boolean at once). When the any will iterate through this Generator, is requested one element at a time, until find one that satisfies the condition. What should take is the creation of Enerator and the calls to next(), as @jsbueno already quoted in the other comment

2 answers

7


The Python interpreter is maintained by hundreds of volunteers over decades. Although it is a complex software, and of course far from perfect, in general for simple and obvious thing, like "stop the search the moment the first result is found", the thing is optimized. (again: hundreds of employees, over almost 3 decades).

And yes, the C function you found should do the standard search for contains - and, she to to find the first value equal. The point is that we have classes in C college, memorizes the for of C with the standard i=0; i<X; i++ , and forgets that they are actually arbitrary expressions that can be very well used - and that is the case there:

Py_ssize_t i;
int cmp;

for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
    cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                       Py_EQ);
return cmp;

Note the stopping condition of the for (the middle, after the first ;): while cmp == 0 and i < PySIZE(a) - then it is not only the second part, the first also - so that PyObject_RichCompareBool true return, the expression of for gives False, and it is stopped. The value of cmp is true and is returned by the function in C.

The other noteworthy detail in this analysis is rightly this: to respect the Python semantics, the comparison between objects is made by this function PyObject_RichCompareBool - and not directly with == on top of one or more fields of the object. It is she who will return True if the objects are the same (the part of the x is e in the documentation) or were the same - x == e - only to return the x == e, and maintain all consistency and simplicity - for the programmer - Python is the cat jump - at this point Python checks if the objects have implementations of the method __eq__ and calls these methods (which can even be in Python). That is, even in high speed comparison, within the native code in the function RichCompareBool, if the object is custom in Python with a __eq__ specific, that code is executed. In the case of integers, the version of __eq__ for integers, very optimized, but still prepared to compare integers with an arbitrary number of digits, it is called.

And even though the expression in Python is equivalent to in, the execution of the operator is much faster because the simple execution of x is e or x == e ... involves several operations on the Python VM (VM == virtual machine, the stack machine interpreting bytecode) - and each operation on the VM will be equivalent to roughly a hundred native CPU operations - while comparisons of a for in C like the above, require very few native expressions in the CPU, of the order of 20.

The method dis.dis allows a visualization of the steps of this expression translated into bytecode:

In [430]: dis.dis("x in e or x == e")                                                                                             
  1           0 LOAD_NAME                0 (x)
              2 LOAD_NAME                1 (e)
              4 COMPARE_OP               6 (in)
              6 JUMP_IF_TRUE_OR_POP     14
              8 LOAD_NAME                0 (x)
             10 LOAD_NAME                1 (e)
             12 COMPARE_OP               2 (==)
        >>   14 RETURN_VALUE

Like the use of any is slower?

The any is slower than the little function you created that makes the direct comparison because it requires, for each element, a context change in the Python virtual machine - the any, as well as the RichCompareBool above, you have to respect the Python protocols for the objects, and in this case, this is calling the method __next__ from Generator Expression you pass as parameter.

Python’s VM is a "large switch case" in native code, which executes expressions and functions in C for each bytecode of the Python machine. It’s very efficient, and it has variables there to track which point of the bytecode is running and some other variables of the state of the running function. Each time the any fetch a new item (which in this case is the result of the expression x is e or x == e, at all times True or False), the VM has to "change context" - that is, change all the internal variables about which code object is running, at which point the bytecode is, etc... that is much more "expensive" in clock cycles than having "True" or "False" of the same expression in the same code block, as it is in its function.

Why Python remains an excellent choice, even with this discrepancy?

The purpose of the language is to bring simplicity to the implementation of complex algorithms by delegating compute-intensive loops to native code. And in this case, it does it very well - if a person uses the in, which is the "obvious" way to do a search in a sequence, the code path used uses plenty of native code and is of course about 5 times faster than rewriting the same search in Python (I thought this difference would be much bigger, actually).

So if you take into account that comparison of in in Python does everything at the maximum possible speed respecting the dynamic nature of Python objects - the comparison works also for classes that re-implement the operation of "==" and do something crazy in the code - the implementation is very good.

If you know that searches for whole numbers, in a collection of homogeneous data types that Fit on the CPU, such as 32 or 64 bit integers, and accurate speed, the way to do this is to use specialized libraries, for example numpy - numpy will do the comparison on an integer array and will do this not only in native code, but using SIMD instructions, and threads on competing Cpus where applicable.

Then, on a machine faster than yours - but I just pasted exactly the code you put into the question, and then added the code needed to do the search using numpy and displayed the time:

Usando o 'in' do Python, demorou 0.044108 e retornou True
Usando minha implementação, demorou 0.157202 e retornou True
Usando a implementação da documentação, demorou 0.262640 e retornou True

In [443]: import numpy as np                                                                                                      

In [444]: np_list  = np.array(list_of_integers, dtype="int32")                                                                    

In [445]: %time x in np_list                                                                                                      
CPU times: user 1.79 ms, sys: 997 µs, total: 2.79 ms
Wall time: 2.21 ms
Out[445]: True

2.21ms, or 0.00221 seconds, compared to 0.044 of the implementation with native integers - a 20x gain in speed. This is possible, again, thanks to language flexibility, which allows numpy arrays to implement the method __contains__. So instead of generic C code for any Python object we were parsing, numpy uses optimized code to search for integers of the desired type.

So the interesting thing about learning are things like this: Python ensures faster developer speed, and once you know what you’re doing, it allows the developer to delegate the most critical links of the program to run in native code, combining the best of both worlds.

But don’t hang up yet

If indeed time is critical in a search for an element in the middle of others' set, there is one thing that is still fastest than using numpy to search native numbers, which is to use the correct algorithm - in lists and arrays of numpy, the in uses a linear search, comparing element by element. It’s easy to see that any algorithm that instead of going through the list linearly goes "right to the point", and tells you whether an element is there or not will be much faster. In Python this is done with the sets (sets):

In [446]: conjunto_grande = {i for i in range(5_000_000)}                                                                         

In [447]: conjunto_grande.add(x)                                                                                                  
In [448]: %time x in conjunto_grande                                                                                              
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.72 µs
Out[448]: True

Ready, the time passed from 2ms in linear search with numpy to 3 Microseconds - a gain of 20000x on top of the in in a list, and without having to resort to another language, Assembler, nothing - just making use of the optimized high-level data types that the language provides and with the programmer "knowing what you are doing".

So when organizations like "Google and NASA" (without any "mythology") choose to use Python in a project, yes, the teams take into account all these factors: how much they will gain in programmers/hour when using a very high level language in development, and when and which parts optimize in native code. The result is projects such as Tensorflow, an exceptionally optimized machinery for operations involving neural networks and machine learning, but can be controlled by a high-level API that saves a gigantic amount of time that would be spent on Boiler-Plate-code if the same library was to be used from a lower-level language.

  • Excellent answer. You know whether numpy is written in Python or C/C++?

  • 2

    The core of numpy - the part where it really is just "run a for and run the operations" is in C. It has Wrappers and all the object handling logic still in Python - that already pass everything digested to the code in C.

  • I added another excerpt in the reply, explaining why the any slows down

3

Essentially yes, the big difference is that the operation is running in C. It makes a difference mainly because it is a compiled and unenterpreted language. In fact, if I didn’t have to respect the Python semantics and deal with the characteristics of the language’s VM, it would be even faster. I don’t know if this is the function that ends up being executed when using a in, but it’s a possibility.

What surprises me is the any() Being so slow, I wonder if it’s actually written in Python. I imagine there’s some kind of abstraction that makes it worse than a simple loop and so I always say there’s a time when abstraction might not be a good idea.

When someone documented talking about any() was just showing how the implementation could be in the language, not that it is actually written like this.

Your perception that should end the search almost find some is correct and is exactly what the any() does, even without seeing the code, after all if someone did not do so could already be considered a bug.

  • About the any(), after I read @hkotsubo’s comment, I believe it’s because I have to assemble the list of booleans (in case passed as arguments to any) and only then the any() search for a True result. You should have another argument (to compare with "esta é a string" instead of True, for example.)

  • 2

    I think it’s the cost of abstraction

  • On projects that require speed, Python should not be an excellent choice. (Although NASA and Google sometimes bet on it).

  • 2

    No, not at all, and then you’re already entering mythology

  • 3

    "any" is slow because it involves a context change in the Python VM - any code has to consume the next (next) element of a generator. Isos probably involves resetting some states in the Python VM, and that for each element in the list. There is certainly room for optimization there - even so it was only twice as slow as doing the entire operation in the same context.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.