What is the logic behind this function?

Asked

Viewed 160 times

2

Why the result of L1 is: [2, 3, 4] instead of [3, 4]?

def remove_dups(L1, L2):
    for e in L1:
        if e in L2:
            L1.remove(e)

L1 = [1, 2, 3, 4]
L2 = [1, 2, 5, 6]
remove_dups(L1, L2)
print(L1)

I tested in Python Tutor and for some reason the second step of for is worth 3. This doesn’t make sense to me.

  • Another alternative is to use set subtraction with: L1 = list(set(L1) - set(L2))

  • When removing elements from the L1 array you are breaking the index. This should generate an explicit error, but I don’t know how python treats it. The correct is to use while, that is, while L1 occurs in L2

3 answers

5


When you create a for uses an iterator, which controls the progress of the collection. When you remove the first element, the second becomes the first, the third becomes the second and so on. So after the first passage of the loop the list looks like this:

[2, 3, 4]

And the iterator that was 0 passes to 1 in the second passage, then the check is the number 3, which is not in L2, nothing is removed, and then the same as 4.

You cannot change the object of a collection during an interaction with impunity. The most correct would be to make a copy or use another algorithm.

What you’re doing is (what’s behind the algorithm of for):

def remove_dups(L1, L2):
    i = 0
    while i < len(L1):
        if L1[i] in L2:
            L1.remove(L1[i])
        i += 1

L1 = [1, 2, 3, 4]
L2 = [1, 2, 5, 6]
remove_dups(L1, L2)
print(L1)

And the correct thing would be something like this, although there is still risk if there is some maintenance in the code:

def remove_dups(L1, L2):
    i = 0
    while i < len(L1):
        if L1[i] in L2:
            L1.remove(L1[i])
        else:
            i += 1

L1 = [1, 2, 3, 4]
L2 = [1, 2, 5, 6]
remove_dups(L1, L2)
print(L1)

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

2

To maniero’s response explained clearly that the problem in the implemented logic was to change the structure being iterated and its examples could not be more didactic, so I would just add a summarized form of solutions:

def remove_duplicados(l1, l2):
    return [i for i in l1 if i not in l2]

Thus, the function will return a new list with the elements of l1 which do not belong to the l2.

l1 = [1, 2, 3, 4]
l2 = [1, 2, 5, 6]

print(remove_duplicados(l1, l2))  # [3, 4]

See working on Repl.it | Ideone

Some advantages in this solution are:

  1. When replacing the brackets with parentheses on return, the function will return a generator, which can save memory;
  2. No need to create copies of entry lists;
  3. The entry lists remain intact (may be useful in some cases);
  4. Readable solution;

Commentary:

When it comes to the memory saving mentioned in item 1, there are two observations: a) the economy is due to the fact that, by replacing the brackets with parentheses, the return of the function ceases to be a list and becomes a generator; thus, the final list is not stored in memory, as the generator calculates each item according to its use (Lazy calculation or call-by-need). b) references to input objects remain the same within of the generator, not creating a copy of them; that is, even after the generator is defined, any change in the input lists will affect the generator.

With brackets, the return is a new list:

x = [1, 2, 3, 4, 5]
y = [2, 4]

def remove_duplicados(l1, l2):
    return [i for i in l1 if i not in l2]

z = remove_duplicados(x, y)

print(type(z))  # <class 'list'>

# Apenas para demonstrar a saída:
print(list(z))  # [1, 3, 5]

See working on Repl.it | Ideone | Github GIST

With parentheses, the return is a generator:

x = [1, 2, 3, 4, 5]
y = [2, 4]

def remove_duplicados(l1, l2):
    return (i for i in l1 if i not in l2)

z = remove_duplicados(x, y)

print(type(z))  # <class 'generator'>

# Apenas para demonstrar a saída:
print(list(z))  # [1, 3, 5]

See working on Repl.it | Ideone | Github GIST

As the reference of the lists remain the same, any change in them will be reflected in the result of the generator:

x = [1, 2, 3, 4, 5]
y = [2, 4]

def remove_duplicados(l1, l2):
    return (i for i in l1 if i not in l2)

z = remove_duplicados(x, y)

# Alteração em y:
y.append(3)

print(type(z))  # <class 'generator'>

# Saída de z modificada devido alteração em y:
print(list(z))  # [1, 5]

See working on Repl.it | Ideone | Github GIST

Items 2 and 3 are complementary, because if the problem demands that the lists of entries should remain intact after the execution of the function, using the comprehensilist on or the generator will not need to create a memory copy of the list to modify it; the return itself will be a new list only with the desired items.

Already item 4 is quite subjective, but I, particularly, think the solution is much more readable:

def remove_duplicates(source, reference):
    return (item for item in source if item not in reference)
  • I didn’t understand the first Bullet. The second you mean you avoid having to do it manually, but not the copy itself?

  • @Maniero The first referred me to the fact that, returning the generator, you will effectively have two lists in memory: l1 and l2; not three, considering the list that is returned by the function. Regarding the second, yes, manually, considering that it is not desired that the entry lists are changed by the function.

  • That is, create the copy, there will be a third list, just not explicit?

  • @Maniero Regarding the first item? If it is, there is no third list. Values will only be generated when the object (generator) is iterated.

0

The Python for keeps a counter where it is in the list, if the list size is reduced, then it will mess up the counter, this code would work well if the list size did not change. The best way to do that is to clone the list.

That’s a book problem Introduction to Computation and Programming Using Python: With Application to Understanding Data

Browser other questions tagged

You are not signed in. Login or sign up in order to post.