Python is not so "smart" for redundant operations

Question

Python is not so "smart" for redundant operations

Asked 6 years, 10 months ago

Viewed 229 times

-1

Unlike many cases in c++, Python is apparently not so smart to optimize redundant operations, even between constants.

I made a simple benchmark to test math.cos inside and outside a loop.

In this first case, I make Python calculate cos(3) 10 million times:

from datetime import *
from math import *

ini = datetime.now()
for a in range(10000000):
    x = cos(0) 
fim = datetime.now()
print(fim - ini)

The result is 0:00:01.360958.

Now, just replacing x = cos(0) for x = 1 (or just putting cos(0) in an out-of-loop variable), time stands 3 times faster: 0:00:00.429995

Well, if the operation cos(0) always generates a constant (1), because Python does not optimize the execution with a cache, thus avoiding the repetition of the same operation cos constantly?

One of Python’s philosophies is to make life easier for the programmer. But in this case what is happening is exactly the opposite.

4

In fact, Python always seeks to be simple for the developer to solve the problem, performance was never the focus. If you need something faster, consider changing the language. Note that even calling a function with constant return will be practically the same. This is how long it takes Python to solve 10,000,000 calls. It has no direct relation to the cosine calculation.

– Woss

2018/09/26 at 18:45
3

To make it clear, although I commented that the question was based on a wrong premise, I think it is a question.

– Woss

2018/09/26 at 22:12

3 answers

6

"Making the life of the programmer easier" is not exactly an objective thing. Despite this, it is what language does yes - but "making the life of the programmer easier" is not always and in all cases "run all existing code as optimally as possible".

Even, it is known, and said in almost all introductory Python material that in fact, when using Python, you are rather doing a performance trade for ease of writing code.

And this difference in performance is linked to the same main reason he didn’t try to short-circuit the call to cos: Python is a dynamic language - which means things are defined at runtime.

This example you passed doesn’t look so bad. On the Python virtual machine it runs the opcodes 10_000_000 for the for snippet, and the call to cos and return value explicitly. Only this fits in the processor’s L1 cache - if the cos was a function that, even deterministic, used more memory than it fits in L1, that time would be much, much worse.

The main reason is that by making the call to cos, the language has not how to know the function cos in the past 1 million and 2 of the loop is the same that was there in the past 1 million and 1.

Note that it is possible that your code is like this:

from datetime import *
from math import *
import random

ini = datetime.now()
for a in range(10000000):
    x = cos(0) 
    if random.randint(0, 1_000_000) == 0:
          cos = sin

And then, at some point, the called function becomes another - as everything is checked at runtime, the language will simply always call the correct function on the line x = cos(0). Code of this type is not common, but even so, it is the premise of the language that this type of code has work.

I’ve followed many email discussions among Python developers of optimization cases that cannot be implemented in cPython - the language reference implementation - because they cannot override a possible change in objects between two stages of execution. And I’m talking about optimizations far less naive than the one you assumed should exist, with your question.

Returning to this specific optimization, in C, C++ (and other static languages) it is possible to change the function called by a name, but this is not common, and will involve a different syntax - type, call the function at the address pointed by the function pointer tal. So it’s much simpler to optimize - but even in C, I don’t think that would be optimized by default - because the compiler can’t just surmise that the called function is deterministic (that is, it will always return the same value to the same input value).

In the case of cos in C, the only way the compiler "knows" can exchange cos(0) by a constant, is why this is explicitly marked at some point - or in the compiler, or eco-system of the system library that culminates in the file math.h. Python, cos, sin and other functions of the type are not used enough [*] to have this special status, but the compiler does similar optimizations for static expressions placed in the code - if you write in a Python line a = 2 ** 64 - the value of 2 to 64 is calculated at compile time, and this line will load a constant that is inserted into the file .pyc compiled. My test here with C did the same for function output cos.

Any compiler who did this for an arbitrary function would be making the language impractical to be used in practice (let’s assume it’s the code for monitoring whether a vault door is open, and calls the sensor function - the vault spends almost all the time closed, but if when it was open the compiler had assumed that it the return of the check would be "closed", it would be impossible to have this program)

That being said - Python has mechanisms to allow the programmer to explicitly transform a function into something with "cache" - and this he does by creating a new searchable object, dynamically at runtime, with no special rules - in the case, as pointed out in Fernando Savio’s reply, the function functools.lru_cache can be used to decorate a function - and in this case the programmer indicates to the language: "look, for this function here, for the same input values, can always use the same output values".

Now, finally, yes, if you want this kind of automatic optimization, and you want to code in Python, you can still have it all - using the Pypy - pypy is another implementation of Python, which implements build optimization "just in time" (JIT) - then yes, when the Runtime of pypy realizing that he is in loop calling the same function several times, it will compile all the code within the function to native code, and this will run much more quickly - and if at any time within the same loop, execution take a different course (type - sensor detected the open vault door, or variable name cos now points to another function), it reverts to normal execution. And the reason that’s not the "standard" in the implementation of Python is simply that it hasn’t been done - optimization at this level of complexity costs a lot of money (in terms of man hours), and even with companies like Google, Redhat, IBM and Intel sponsoring or willing to sponsor a few million dollars in standard Python optimization, has not yet been done due to the lack of people who know enough about Python and can interact with the community to do so. But yes, it is possible that in several years, because of such efforts, we will have JIT execution also in the reference implementation.

[*] - then if you really need the performance "full" of the CPU for calculations of sin and cos, and is programming in Python, the recommendation is that your numbers, with which you will do the math, are encapsulated in special objects that do this kind of operation using native code - for example, the Numpy library, or some Opengl library or 3d graphics, if used for rendering images, for example. cos Python native has to "wrap" cos in an object float - that has all the dynamism that Python needs - this operation is hundreds of times more expensive than simply storing in memory the 8 byte floating point number resulting from the native operation.

Browser other questions tagged python python-3.x performance cache optimizing

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2018-09-26T18:43:50+00:00

One of Python’s philosophies is to make life easier for the programmer

It is answered. This is the opposite of what you are concerned about. Perhaps it is a matter of interpreting the text. At least I can not imagine how the lack of an optimization is difficult for the programmer. I understand the desire, but it doesn’t fit with Python’s philosophy. Everyone can try to interpret the text as they like, but most people know that ease and optimization are different things.

Who wants performance, efficiency, zero-cost abstraction, these things are C++. C, Rust too, of course. Python has never had these things as a basis to decide anything to do. If performance is the desired choose another language. Look how easy.

However, did you test on C++? And did it correctly? Testing is tricky. I don’t guarantee it will be optimized there. It is possible, and even likely, but without testing I do not guarantee. And if it is to optimize even this should be cost near zero because the whole loop is useless and just do x = 1 and nothing else, kills the whole loop, or even that because then the variable is not even used. Actually this code does not even make sense. Even the way of measuring is not the most appropriate.

But in the end it is a matter of no one having decided to do the optimization. Perhaps because it could make the compilation slower, or for another reason. It may be that someone still does it, it may be that you open one Issue and someone do, it may be that you wanted to make and deliver to review and the improvement is applied.

by fernandosavio • **9,013** points · Answer 2 · 2018-09-26T18:45:18+00:00

2

As philosophies of Python are 19, and none of them says it’s to make life easier for the programmer.

One of them says:

Explicit is better than implicit.

That said, you as a programmer able to notice that the interpreter does not cache of the latest results of a cosine, could use the Decorator lru_cache module functools which serves precisely for this, because LRU means Last Recent Used, which is just what you want him to do. Back again to the mantra:

Explicit is better than implicit.

1

In fact, "making life easier for the programmer" has nothing to do with the non-issue that was posed.

– jsbueno

2018/09/26 at 19:02
Sorry @jsbueno did not understand your placement.

– fernandosavio

2018/09/26 at 19:10
is that "being easier for programmers" is quite subjective - but it does not refer in any way to 'guess things to be as fast as possible'. Your answer is fine, but you treat this question as if it were true. Why, actually, the zen phrases of Python are kind of anecdotal - of course one of the language’s premises is to make things easier for programmers. It’s just that "making up instant caches just because they do" has nothing to do with it.

– jsbueno

2018/09/26 at 19:34
Yeah, I was also wondering if I wasn’t rude... In the end I don’t know if the answer adds anything to the community outside speak of lru_cache.

– fernandosavio

2018/09/26 at 19:36
In fact, this question would even require a consideration of the nature of language, which is dynamic, and so does not cache anything - and then go through cpython to be the reference implementation, and then yes, point out that pypy has JIT and does what it is wanting.

– jsbueno

2018/09/26 at 21:06
True... I’ll erase the answer since it doesn’t help.

– fernandosavio

2018/09/26 at 21:09
1

It’s not covering everything that you have to cover, but it helps - it brings one of the arguments why this doubt is not very pertinent. A complete and comprehensive answer to that question would be almost a book - so, if everyone who touches here put one or two arguments, everyone contributed.

– jsbueno

2018/09/26 at 21:09

Show 2 more comments