The Python interpreter is maintained by hundreds of volunteers over decades. Although it is a complex software, and of course far from perfect, in general for simple and obvious thing, like "stop the search the moment the first result is found", the thing is optimized. (again: hundreds of employees, over almost 3 decades).
And yes, the C function you found should do the standard search for contains - and, she to to find the first value equal. The point is that we have classes in C college, memorizes the for
of C with the standard i=0; i<X; i++
, and forgets that they are actually arbitrary expressions that can be very well used - and that is the case there:
Py_ssize_t i;
int cmp;
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
return cmp;
Note the stopping condition of the for
(the middle, after the first ;
): while cmp == 0
and i < PySIZE(a)
- then it is not only the second part, the first also - so that PyObject_RichCompareBool
true return, the expression of for
gives False, and it is stopped. The value of cmp
is true and is returned by the function in C.
The other noteworthy detail in this analysis is rightly this: to respect the Python semantics, the comparison between objects is made by this function PyObject_RichCompareBool
- and not directly with ==
on top of one or more fields of the object. It is she who will return True
if the objects are the same (the part of the x is e
in the documentation) or were the same - x == e
- only to return the x == e
, and maintain all consistency and simplicity - for the programmer - Python is the cat jump - at this point Python checks if the objects have implementations of the method __eq__
and calls these methods (which can even be in Python). That is, even in high speed comparison, within the native code in the function RichCompareBool
, if the object is custom in Python with a __eq__
specific, that code is executed. In the case of integers, the version of __eq__
for integers, very optimized, but still prepared to compare integers with an arbitrary number of digits, it is called.
And even though the expression in Python is equivalent to in
, the execution of the operator is much faster because the simple execution of x is e or x == e ...
involves several operations on the Python VM (VM == virtual machine, the stack machine interpreting bytecode) - and each operation on the VM will be equivalent to roughly a hundred native CPU operations - while comparisons of a for
in C like the above, require very few native expressions in the CPU, of the order of 20.
The method dis.dis
allows a visualization of the steps of this expression translated into bytecode:
In [430]: dis.dis("x in e or x == e")
1 0 LOAD_NAME 0 (x)
2 LOAD_NAME 1 (e)
4 COMPARE_OP 6 (in)
6 JUMP_IF_TRUE_OR_POP 14
8 LOAD_NAME 0 (x)
10 LOAD_NAME 1 (e)
12 COMPARE_OP 2 (==)
>> 14 RETURN_VALUE
Like the use of any
is slower?
The any
is slower than the little function you created that makes the direct comparison because it requires, for each element, a context change in the Python virtual machine - the any
, as well as the RichCompareBool
above, you have to respect the Python protocols for the objects, and in this case, this is calling the method __next__
from Generator Expression you pass as parameter.
Python’s VM is a "large switch case" in native code, which executes expressions and functions in C for each bytecode of the Python machine. It’s very efficient, and it has variables there to track which point of the bytecode is running and some other variables of the state of the running function. Each time the any
fetch a new item (which in this case is the result of the expression x is e or x == e
, at all times True
or False
), the VM has to "change context" - that is, change all the internal variables about which code object is running, at which point the bytecode is, etc... that is much more "expensive" in clock cycles than having "True" or "False" of the same expression in the same code block, as it is in its function.
Why Python remains an excellent choice, even with this discrepancy?
The purpose of the language is to bring simplicity to the implementation of complex algorithms by delegating compute-intensive loops to native code. And in this case, it does it very well - if a person uses the in
, which is the "obvious" way to do a search in a sequence, the code path used uses plenty of native code and is of course about 5 times faster than rewriting the same search in Python (I thought this difference would be much bigger, actually).
So if you take into account that comparison of in
in Python does everything at the maximum possible speed respecting the dynamic nature of Python objects - the comparison works also for classes that re-implement the operation of "==" and do something crazy in the code - the implementation is very good.
If you know that searches for whole numbers, in a collection of homogeneous data types that Fit on the CPU, such as 32 or 64 bit integers, and accurate speed, the way to do this is to use specialized libraries, for example numpy - numpy will do the comparison on an integer array and will do this not only in native code, but using SIMD instructions, and threads on competing Cpus where applicable.
Then, on a machine faster than yours - but I just pasted exactly the code you put into the question, and then added the code needed to do the search using numpy and displayed the time:
Usando o 'in' do Python, demorou 0.044108 e retornou True
Usando minha implementação, demorou 0.157202 e retornou True
Usando a implementação da documentação, demorou 0.262640 e retornou True
In [443]: import numpy as np
In [444]: np_list = np.array(list_of_integers, dtype="int32")
In [445]: %time x in np_list
CPU times: user 1.79 ms, sys: 997 µs, total: 2.79 ms
Wall time: 2.21 ms
Out[445]: True
2.21ms, or 0.00221 seconds, compared to 0.044 of the implementation with native integers - a 20x gain in speed. This is possible, again, thanks to language flexibility, which allows numpy arrays to implement the method __contains__
. So instead of generic C code for any Python object we were parsing, numpy uses optimized code to search for integers of the desired type.
So the interesting thing about learning are things like this: Python ensures faster developer speed, and once you know what you’re doing, it allows the developer to delegate the most critical links of the program to run in native code, combining the best of both worlds.
But don’t hang up yet
If indeed time is critical in a search for an element in the middle of others' set, there is one thing that is still fastest than using numpy to search native numbers, which is to use the correct algorithm - in lists and arrays of numpy, the in
uses a linear search, comparing element by element. It’s easy to see that any algorithm that instead of going through the list linearly goes "right to the point", and tells you whether an element is there or not will be much faster. In Python this is done with the sets (set
s):
In [446]: conjunto_grande = {i for i in range(5_000_000)}
In [447]: conjunto_grande.add(x)
In [448]: %time x in conjunto_grande
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.72 µs
Out[448]: True
Ready, the time passed from 2ms in linear search with numpy to 3 Microseconds - a gain of 20000x on top of the in
in a list, and without having to resort to another language, Assembler, nothing - just making use of the optimized high-level data types that the language provides and with the programmer "knowing what you are doing".
So when organizations like "Google and NASA" (without any "mythology") choose to use Python in a project, yes, the teams take into account all these factors: how much they will gain in programmers/hour when using a very high level language in development, and when and which parts optimize in native code. The result is projects such as Tensorflow, an exceptionally optimized machinery for operations involving neural networks and machine learning, but can be controlled by a high-level API that saves a gigantic amount of time that would be spent on Boiler-Plate-code if the same library was to be used from a lower-level language.
I believe that the interpreter already has the bytecodes optimized for functions and builtin operators of Python, by the way, I believe that this code is in Cpython. https://en.wikipedia.org/wiki/CPython
– Ronald Rodrigues
"the ideal would be to stop the execution at the time when the first comparison is valid" - But the
any
Doesn’t it stop when you find an element? At least the documentation says that it stops at: https://docs.python.org/3/library/functions.html#any - And to test the runtime, it is best to use the moduletimeit
: https://docs.python.org/3.7/library/timeit.html– hkotsubo
So why does it take twice as long??
– Breno
Oh yes, it should be because I go twice (to assemble the boolean list and then go through).
– Breno
(x is e or x == e for e in list_of_integers)
does not go through the entire list, only creates a Generator, that is Lazy (does not return all boolean at once). When theany
will iterate through this Generator, is requested one element at a time, until find one that satisfies the condition. What should take is the creation of Enerator and the calls tonext()
, as @jsbueno already quoted in the other comment– hkotsubo