Check "in" return change the Python result

Asked

Viewed 262 times

13

It is known that to check whether a particular item does not belong to a list it is sufficient to use the operator in:

values = [1, 2, 3, 4, 5]

if 9 not in values:
    print("9 não pertence à lista.")
else:
    print("9 pertence à lista.")

# 9 não pertence à lista.

See working in Ideone | Repl.it

And that this operator returns True when the item belongs to the list in question, or False otherwise, such return may be attributed to a variable:

condition = 9 in values

print(condition) # False

See working in Ideone | Repl.it

In this case, you can use the variable as the condition:

if condition == False:
    print("9 não pertence à lista.")
else:
    print("9 pertence à lista.")

# 9 não pertence à lista.

However, if you use the expression with in in this check, the result is changed:

if 9 in values == False:
    print("9 não pertence à lista.")
else:
    print("9 pertence à lista.")

# 9 pertence à lista.

See working in Ideone | Repl.it

In this last syntax, the result is that 9 belongs to the list (even if it doesn’t belong).

Why does this happen? Is this behavior what Python expects? If so, how does it analyze this last expression?

  • 1

    I created the question because I thought it was a situation at least curious that could generate confusion for beginners, especially when coming from other languages, so it is expected that the answers are as complete as possible.

  • The only thing that comes to mind is order of precedence. If you protect the 9 in values with parentheses gives the same thing?

  • @Joãovictor In parentheses produces the expected result.

  • @Joãovictor the answer would be no and yes, respectively. The problem is not the order of precedence, but if you put the parentheses the result is expected.

  • The problem is that the order of precedence is that the == operator is always executed before the in operator. It is as if you tried to add a number to then divide and not protect the sum with parentheses ( the division would occur first ). What happens is that the code in question checks if values is false before executing "9 in values". What causes the clause to return true.

  • @Joãovictor was answered this already, but the answer was deleted because that’s not the problem. If you want, access the chat that I can explain better the reason other than that (not to extend the discussion here).

  • The operator "==" does not have the same precedence as the operator "in"... This is for its first justification. The second about making a mistake of trying to iterate for a boolean I still can’t explain. But I’ll look for know and answer here.

  • @Joãovictor See the official documentation with the table of precedence.

  • Ah ok, was python used 3? Sorry my ignorance, I tested in python 2.7.

  • @Joãovictor same thing.

  • Actually the documentation says the same order of precedence, but if you keep testing prints with these expressions you don’t come to that conclusion. I haven’t read the documentation, but I imagine you have some exception. I can’t explain more than that. If someone answers, I will come back here to read.

Show 7 more comments

2 answers

13


What happens there is a little confusion, perhaps by the custom with other languages and by the "unclear" form of this expression.

A little about comparisons

In Python, unlike most languages I know, it is possible to have expressions in the format a < b < c. This is very common in mathematics and the interpretation of this expression in Python takes place in the same way.

For example:

a = 2
b = 3
c = 4

print(a < b < c) 
# True

See working on Repl.it

The above expression will be evaluated as a < b and b < c.

A little bit of the documentation that talks about this:

(...) Also Unlike C, Expressions like a < b < c have the Interpretation that is Conventional in Mathematics (...)

(...) Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z (...)

In free translation:

(...) Also unlike C, expressions like a < b < c have the interpretation that is conventional in mathematics (...)

(...) Comparisons can be chained arbitrarily, for example: x < y <= z is equivalent to x < y and y <= z

What happens in your code

The first thing that comes to mind when you see the expression 9 in values == False is that the interpreter shall evaluate the outcome of 9 in values and after that, compare with False.

In fact, the expression ends up falling in the above case and is evaluated to 9 in values and values == False.

What, in turn, is evaluated to False and True and after that to False

For example:

a = 'ana'
b = 'nana'
c = 'banana'

print(a in b in c) 
# True

See working on Repl.it

10

Yes, this behavior is what Python expected precisely by the way he will analyze the expression, but such behaviour has nothing to do with the order of precedence of the operators.

What happens is that in Python there is a syntactic sugar for boolean expressions when used two operators as in the problem cited. The actual expression analyzed by Python will actually be the two operators executed independently, repeating the central operand, uniting the results by a logical operation and. In other words, an expression like A <op1> B <op2> C, being A, B and C the operands and <op1> and <op2> operators, the expression analysed will be (A <op1> B) and (B <op2> C). In this case, by:

if 9 in values == False:
    print("9 não pertence à lista.")
else:
    print("9 pertence à lista.")

What happens, in fact, will be:

if (9 in values) and (values == False):
    print("9 não pertence à lista.")
else:
    print("9 pertence à lista.")

Where 9 in values returns false and values == False returns false; therefore, the final result will also be false, running the block in else.

In this particular case really the result seems to be quite strange, but this syntactic sugar is especially useful, for example, to check whether a certain value belongs to a range:

if 0 < x < 10:
    print("x está entre 0 e 10")
else:
    print("x é menor que 1 ou maior que 9")

Such a code would be the equivalent of doing:

if 0 < x and x < 10:
    print("x está entre 0 e 10")
else:
    print("x é menor que 1 ou maior que 9")

Or equivalent to the traditional form:

if x > 0 and x < 10:
    print("x está entre 0 e 10")
else:
    print("x é menor que 1 ou maior que 9")

This behavior explains why the first form presented in the question is ideal in Python (mode pythonic):

if 9 not in values:
    print("9 não pertence à lista.")
else:
    print("9 pertence à lista.")

TL;DR

One way to verify this behavior is to analyze the bytecode generated by Cpython. For this, we can use the native library dis. For the sake of simplification, we will consider a somewhat simpler expression, which reproduces the same behaviour as the problem addressed:

a < b < c

Where we will try to show that the analyzed expression will be (a < b) and (b < c).

We get the analysis of bytecode of this expression making:

import dis

print(dis.dis("a < b < c"))

In which the result will be:

  1           0 LOAD_NAME                0 (a)
              2 LOAD_NAME                1 (b)
              4 DUP_TOP
              6 ROT_THREE
              8 COMPARE_OP               0 (<)
             10 JUMP_IF_FALSE_OR_POP    18
             12 LOAD_NAME                2 (c)
             14 COMPARE_OP               0 (<)
             16 RETURN_VALUE
        >>   18 ROT_TWO
             20 POP_TOP
             22 RETURN_VALUE

In the documentation itself we can obtain the details of each operation:

  1. LOAD_NAME puts the value associated with the name a in the pile;
  2. LOAD_NAME puts the value associated with the name b in the pile;
  3. DUP_TOP duplicates the reference at the top of the stack;
  4. ROT_THREE moves the second and third stack values one position and moves the top to position 3;
  5. COMPARE_OP performs the operation < between the two values at the top of the stack, removing them, and adding the result;
  6. JUMP_IF_FALSE_OR_POP sets a conditional deviation based on the stack top value: if the value is false, the execution jumps to the line indicated by >> (item 10 from this list), otherwise the top value of the stack is removed and the run continues;
  7. LOAD_NAME puts the value associated with the name c in the pile;
  8. COMPARE_OP performs the operation < between the two values at the top of the stack, removing them, and adding the result;
  9. RETURN_VALUE returns the value from the top of the stack;
  10. ROT_TWO exchange the two values at the top of the stack;
  11. POP_TOP discards the top of the stack;
  12. RETURN_VALUE returns the value from the top of the stack;

The jump that can occur between items 6 and 10 is what we call short-circuit of a logical expression.

Analyzing the bytecode of the question

For the case of the question, let us consider the expression:

9 in values == False

The bytecode is almost identical to the previous one (which shows that the previous expression actually reproduces the same behavior):

  1           0 LOAD_CONST               0 (9)
              2 LOAD_NAME                0 (values)
              4 DUP_TOP
              6 ROT_THREE
              8 COMPARE_OP               6 (in)
             10 JUMP_IF_FALSE_OR_POP    18
             12 LOAD_CONST               1 (False)
             14 COMPARE_OP               2 (==)
             16 RETURN_VALUE
        >>   18 ROT_TWO
             20 POP_TOP
             22 RETURN_VALUE

The only differences, in fact, are that for the values 9 and False operations are executed LOAD_CONST, for being constant, and no longer LOAD_NAME, relating to variables.

Running, we have:

  1. Add constant value 9 to stack;

      | Stack
    --+--------
    1 | 9
    --+--------
    2 |
    --+--------
    3 |
    
  2. Adds the value values stacked;

      | Stack
    --+--------
    1 | 9
    --+--------
    2 | values
    --+--------
    3 |
    
  3. Duplicate the top value of the stack;

      | Stack
    --+--------
    1 | 9
    --+--------
    2 | values
    --+--------
    3 | values
    
  4. Climbs one position the second and third values, moving the top to the third position;

      | Stack
    --+--------
    1 | values
    --+--------
    2 | 9
    --+--------
    3 | values
    
  5. Executes the operator in between the top two values 9 in values, stacking the result;

      | Stack
    --+--------
    1 | values
    --+--------
    2 | False
    --+--------
    3 | 
    
  6. If the top of the stack is false, skip the execution (stack remains unchanged);

  7. Swap the top two values of the stack;

      | Stack
    --+--------
    1 | False
    --+--------
    2 | values
    --+--------
    3 | 
    
  8. Discards value from top of stack;

      | Stack
    --+--------
    1 | False
    --+--------
    2 | 
    --+--------
    3 | 
    
  9. Returns the value from the top of the stack False;

Thus it is possible to clearly understand the reason for the else be executed in the problem and that the second operator, ==, is not even analyzed, due to the short circuit that occurs in the logical expression.

It is worth noting that even if there is no explicit operator and being executed, the behavior "return the first operand if it is false, if it does not return the second" is the natural behaviour of the operator and, which is why it is said that the expression is evaluated as if it existed and between the expressions.

Interesting readings

How the Python 'in' operator works

Logic operations in Python 2.7

Browser other questions tagged

You are not signed in. Login or sign up in order to post.