Doubt about Python list

Asked

Viewed 105 times

0

The problem is to make a function that receives a list, and returns True if any of the elements are repeated, or False otherwise, without modifying the original list. I made an algorithm, but it is always giving the same result:

def verificar_lista(lista):
    itens = lista
    for i in range(len(itens)):
        if itens[i] not in itens:
            return False
        else:
            return True
    
        

print(verificar_lista([1, 9, 2, 4, 3, 3, 2]))
print(verificar_lista([1, 2, 3, 4, 5, 6, 7]))
  • 1

    You’re going through the list values and literally checking if they’re on the list... they always will be, so the feedback is always True.

3 answers

2

There’s a conceptual error in your algorithm:

1: def verificar_lista(lista):
2:     itens = lista
3:     for i in range(len(itens)):
4:         if itens[i] not in itens:
5:             return False
6:         else:
7:             return True

See, in line 3, that you are going through each element of the list and for each one of them (line 4 now) you check whether the item is present in the list.

But think. Now, if I’m iterating on an item of the list, it will find itself in it. Then this condition does not make any sense (it will always be false, since the item will always be in the list). Therefore, the function in question will always return True.


Since the statement does not ask for the specific return of the repeating (m) element(s), you can use the set to solve this problem in a simple way. Since it is a data structure that does not accept repeated values, what you can do is, from the original list, create a new set and check if the number of elements are equal.

If they are equal, you can state that there are no duplicate elements in the list. Otherwise, one (or more) of the elements has been removed automatically by being duplicated. There you know that there are repeated elements.

Behold:

def verificar_lista(lista):
    elementos = set(lista)

    if (len(elementos) == len(lista)):
        return False  # Não há elementos repetidos

    return True  # Há elementos repetido.


print(verificar_lista([1, 9, 2, 4, 3, 3, 2]))  # True (3 se repete)
print(verificar_lista([1, 2, 3, 4, 5, 6, 7]))  # False

Removing the if (unnecessary in the above code since the function is the result of a simple comparison evaluation):

def verificar_lista(lista):
    return len(set(lista)) != len(lista)


print(verificar_lista([1, 9, 2, 4, 3, 3, 2]))  # True (3 se repete)
print(verificar_lista([1, 2, 3, 4, 5, 6, 7]))  # False

The advantage of using the set for this type of situation it is much more efficient to use the operator in. Behold here that the complexity of in is linear and by nesting it inside another for, you end up creating algorithm close to quadratic complexity.

If the statement were a little more complex, you could use a Counter also. But in this case I believe to be half Overkill.

  • In fact wear Counter is an exaggeration: https://answall.com/a/486653/112052. (and it was worse than I imagined)

  • 1

    @hkotsubo, great response, as always, right? How are we going to compete with that? : D

2

def verificar_lista(lista):
    itens = lista
    for i in range(len(itens)):
        if itens[i] not in itens:
            return False
        else:
            return True

Doing the table test is easily shown that what your code does is:

  • If the first element of the list is not in the list, it returns false;
  • Otherwise, it returns true.

The point is that always the first element of the list will be in the list and thus always the return will be True.

As the idea is to check duplicates, you need to check that number is also in the rest of the list, ignored itself. In Python you can do this using Slice:

def verificar_lista(lista):
    for posicao, numero in enumerate(lista):
        if numero in lista[posicao+1:]:
            return True
    else:
        return False

2

The other answers already explained the problem of your algorithm, but just to give a few more solution options, an alternative is to use a Counter:

from collections import Counter

def verificar_lista(lista):
    return len(Counter(lista)) != len(lista)

The Counter is a dictionary that contains, for each element of the list, the amount of times it occurs. So if his number of keys equals the size of the list, it’s because no element repeats. But in this particular case I confess that I find it an exaggeration to use Counter (we’ll see why in the end), and would only be recommended if you actually need the amounts of times each element occurs.


Another option is to use set and gradually add elements to it, until you find one that is already:

def verificar_lista(lista):
    s = set()
    return any(x in s or s.add(x) for x in lista)

The idea is to check if the element is already in the set, and if not, add. If you find an element that is already, any returns True, indicating that the element is repeated.


Now let’s compare the performance of these solutions with the other answers. By doing a quick test with the module timeit:

from collections import Counter

def set1(lista): # solução de outra resposta, com set
    return len(set(lista)) != len(lista)

def set2(lista): # set com any
    s = set()
    return any(x in s or s.add(x) for x in lista)

def counter(lista): # collections.Counter
    return len(Counter(lista)) != len(lista)

def sublista(lista): # solução de outra resposta, com sub-listas
    for posicao, numero in enumerate(lista):
        if numero in lista[posicao+1:]:
            return True
    else:
        return False

from timeit import timeit

listas = [ [1, 9, 2, 4, 3, 3, 2], [1, 2, 3, 4, 5, 6, 7] ]
funcoes = ['set1', 'set2', 'counter', 'sublista']
# executa 1 milhão de vezes cada teste
for func in funcoes:
    print(f'{func:>8}', timeit(f'for lista in listas:\n\t{func}(lista)', number=1000000, globals=globals()))

The times may vary from one machine to another, but in mine the result was (times in seconds):

    set1 1.2527604
    set2 3.2695244
 counter 5.8885616
sublista 2.962209399999999

That is, the solution with set of another answer was faster, followed by the sub-list solution. Afterward, set with any was the third and Counter was the worst. In Ideone.com and in the Repl.it times were different, but the result was similar (Obs: no Ideone.com had to decrease the amount of times because it was breaking the time limit).


Of course, for a few small lists the difference will be insignificant. And the result also depends on the lists. For example, I did another test with this list:

[1, 1] + list(range(1000))

That is, two elements repeated at the beginning and a thousand more numbers (from 0 to 999). The result was:

    set1 18.3008393
    set2 0.9740259000000009
 counter 45.1233362
sublista 2.3400793999999934

In this case, the alternative solution with set (the one you use any) was faster, because it detects right in the first iterations that the element is repeated (the any interrupts the loop as soon as it finds a repeat, then it only iterates through the first 2 numbers).

The second fastest solution was with sub-lists, also because it closes the for as soon as it finds the repeated number (the delay in relation to the first solution may be due to having to build the sub-list with 999 elements).

Already the solution with set of the other answer took longer because it needs to build all the set first, only then to catch the size.

And the solution with Counter was the worst of all (much worse, by the way), because she had to count the amount of all 1000 elements (and this confirms that using Counter in this case is really an exaggeration).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.