Remove items from one list that are in another

Asked

Viewed 516 times

1

I have two lists with values, and when I try to search an item from one to another, to see if it exists, it’s returning False even if knowing that the item exists in the searched list.


proc = [1,2,3,4]
fat = [3,4,5,6]


for p in proc:
  if proc[0] not in fat:
    result.append(proc[0])

print(result)

[]

These lists are coming from SELECT in a database, and contains a table field.

What could I be doing wrong?

  • I’m not sure I understand, but if you want the elements of proc who are not in fat, just do result = [ p for p in proc if p not in fat ]. And if the order of the elements is not important, another alternative is result = list(set(proc) - set(fat)) (since a set is optimized not to allow duplicates, but it does not guarantee the same order of the original list)

  • That’s right, python, it worked perfectly. I’m used to going through the list, I don’t realize that python is perfect at manipulating lists tuples and dictionaries.

  • see if it fits, I’ve circled the ideone this way https://ideone.com/0tMlhG

2 answers

1

If the idea (confirmed in comments) is to have the elements of proc that nay are in fat, then one option is:

proc = [1, 2, 3, 4]
fat = [3, 4, 5, 6]

result = []
for p in proc:
  if p not in fat:
    result.append(p)

print(result) # [1, 2]

That is, for each element of proc, check if he is not in fat (and if not, add in result).

The error of your code is that you were checking proc[0] (the first element of proc) in all iterations.


Another alternative to the above code is to use a comprehensilist on, much more succinct and pythonic:

result = [ p for p in proc if p not in fat ]

But there’s a detail there. For every element of proc, we have to check if it is in fat. And the operator in has linear complexity: he travels fat from the beginning, until you find the element (or go through the whole list if the element does not exist in it). That is, this algorithm goes through several and several times the same list (which makes it a strong candidate to be a Shlemiel the Painter’s Algorithm).

If the order of the elements is not important, and there are no duplicate elements (or there is, but in the result duplicates can be deleted), one option is to use set, which is a optimized structure for certain operations, such as checking whether an element exists, among others.

So the idea is to turn each list into one set, subtract them (so I’ll have another set containing the elements of one that are not in the other), and in the end I turn again into list:

result = list(set(proc) - set(fat))

Recalling again that:

  • one set nay guarantees the order of the elements, then the list result will not necessarily have the numbers in the same order they were in the original list (may be yes, may not be, so don’t count on it)
  • one set eliminates duplicate values, so if the list proc for example, [1, 2, 3, 4, 1], the end result will be only [1, 2]

But if maintaining the original order of the elements and/or not eliminating duplicate values are requirements of the problem, there is no way, you have to do the for even.


Just to compare the performance of each solution, I did a little test with the module timeit:

import random
import timeit

# criar listas com mil números aleatórios
nums = range(10000) # números possíveis para as listas
proc = random.choices(nums, k=1000)
fat = random.choices(nums, k=1000)

n = 100 # rodar 100 vezes cada teste
r = 3 # repetir por 3 vezes (assim, repeat retorna uma lista com 3 tempos)
print(timeit.repeat("result = []\nfor p in proc:\n  if p not in fat:\n    result.append(p)", repeat=r, number=n, globals=globals()))
print(timeit.repeat('[ p for p in proc if p not in fat ]', repeat=r, number=n, globals=globals()))
print(timeit.repeat('list(set(proc) - set(fat))', repeat=r, number=n, globals=globals()))

Of course times can vary depending on the hardware and other factors, and for small lists - like the ones you used - the difference will be insignificant, but in general, use set showed much faster. On my machine the times were (in seconds):

[1.4215344, 1.4107937, 1.4132282999999997]
[1.4147277999999996, 1.4120507999999994, 1.3960000999999993]
[0.015077500000000299, 0.014714099999999064, 0.015252100000001434]

That is, to use set was generally about 100 times faster. In Repl.it the results were similar.

1

From what I understand of your statement, you need to find the values that are comuns to both lists.

If what matters is valor of the element, without taking into account the quantidade of occurrences of each element, we can convert the two lists to conjuntos, calculate the intersection of sets and then display the result in a new list. For this we can use the following algorithm...

proc = [1, 2, 3, 4]
fat = [3, 4, 5, 6]
proc = set(proc)
fat = set(fat)

inter = list(proc & fat)
print(inter)

See how the algorithm works on repl it..

Now, if you intend to display the elements in ascending order of values, you can sort the values in the output. For this we apply the function sorted. To do this just use the algorithm...

proc = [1, 2, 3, 4]
fat = [3, 4, 5, 6]
proc = set(proc)
fat = set(fat)

inter = list(sorted(proc & fat))
print(inter)

See how the algorithm works on repl it..

Browser other questions tagged

You are not signed in. Login or sign up in order to post.