Count recurrences in a list

Question

Count recurrences in a list

Asked 4 years, 10 months ago

Viewed 148 times

2

I made the following code to count the recurrences in a list:

n = int(input())
n <= 100 and n >= 1
lista = []
for c in range(n):
     lista.append(int(input()))
for v in lista: 
    repete = lista.count(v)
    print(v,repete)

But the output I receive is this (Input: 56,70,67,56,90)(EXAMPLE):

The expected output was that(it is necessary to maintain the order of the input for the amount of N values in the list):

Hi, it doesn’t solve... That’s because Reverse only keeps order in a list with 3 and 2. However, my list can aggregate N values, so Reverse does not solve... Even so, thank you for your attention! @lmonferrari

– Gabriel Schrader

2020/10/20 at 18:37
And if the list is [1,2,3,2,2,1,1,3,2,2,3,1], must show in which order?

– hkotsubo

2020/10/20 at 18:42
In this case it would be 1 2 3, but I speak of cases where the list is [35,231,12,543,12,35,12,76] in the output I would like to maintain the order of entry and remove the repeated that comes after the first appearance...

– Gabriel Schrader

2020/10/20 at 18:45

3 answers

1

First, don’t use count, because each call needs to go through the entire list to count how many times the element occurs. That is, even if the list has no repeated element, for example, [1, 2, 3], yet it will be traveled unnecessarily 3 times (one to get the count of the 1, another to the 2, another to the 3).

When you go through a list several times without need, you are creating a variation of the call Shlemiel the Painter’s Algorithm (known anedotically for being a "bad algorithm").

So to count the elements, use a Counter, as suggested in another answer, because he walks through the list only once and already keeps the amount of times that all the elements occur.

To maintain the order of the elements, it is no use turning the list into a set, for a set does not guarantee the order of the elements: the documentation says he’s a "unordered Collection" ("cluttered collection"), so if you want to keep order, you have to do it another way.

What you can do is use one set to store the elements that have already been printed, and only print if it is the first occurrence. Something like this:

from collections import Counter

lista =  [35, 231, 12, 543, 12, 35, 12, 76]
contagem = Counter(lista)
visitados = set()
for n in lista:
    if n not in visitados: # se o número ainda não foi impresso
        print(f'{n:<4} {contagem[n]}')
        visitados.add(n) # marco que o número já foi impresso

This way, when the number first appears, it is still not in visitados. Next time it will be and will not be printed again. So I print the elements in the order in which their first occurrence appears in the list.

It’s kind of the same idea that one of the answers made, the difference is that there was used a list instead of set. But I prefer to use set which is more optimized for searches (constant time versus linear time of a list, see here), and how each iteration will be done a search in visitados, this can make a difference as the original list increases (and also if it has many repeatable elements) - see below a comparison between the two solutions.

Finally, I also used f-string (available from Python 3.6) to better format the output, which in the case of the above example will be:

But of course, once you have the algorithm, you can format the output the way you think best.

Doing a quick test with the module timeit, to compare solutions (count() vs Counter):

from collections import Counter

def counter(lista):
    contagem = Counter(lista)
    visitados = set()
    for n in lista:
        if n not in visitados: # se o número ainda não foi impresso
            x = contagem[n] # usando o valor em vez de imprimir
            visitados.add(n)

def list_count(lista):
    filtro = []
    for i in lista:
        if i not in filtro:
            filtro.append(i)
            
    for v in filtro:
        lista.count(v) # só pegando a contagem em vez de imprimir

from random import choices
# lista com mil números (de 0 a 99 para ter vários repetidos)
lista = choices(range(100), k=1000)

from timeit import timeit

# testando mil vezes, com uma expressão válida e outra inválida
params = { 'number' : 1000, 'globals': globals() }
print(timeit('counter(lista)', **params))
print(timeit('list_count(lista)', **params))

I removed the print because I/O can impact the tests and mask the results, so I just took the value of the count and nothing else.

Remembering that times can vary from one machine to another, but anyway, in mine the result was (in seconds):

0.0981796
2.2945525

That is, using count() and a list to store the visited was about 20 times slower than using the Counter and set.

Of course for small lists the difference will be inconspicuous after all, for few data, everything is fast...

Browser other questions tagged python

You are not signed in. Login or sign up in order to post.

by Paulo Marques • **3,739** points · Answer 1 · 2020-10-20T17:00:13+00:00

Let’s use what’s good about Python... :)

See the example below:

>>> from collections import Counter

>>> lista = [1, 1, 2, 3, 4, 4, 2, 2]

>>> c = Counter(lista)

>>> print(c)
Counter({2: 3, 1: 2, 4: 2, 3: 1})

>>> lista_sem_repeticao = list(set(lista))

>>> print(lista_sem_repeticao)
[1, 2, 3, 4]

See that the method Counter that exists in the package collections (Python default, no need to install) returns a dictionary with the chave as found in the list and the valor the amount.

The method set returns the type set which, by definition, is a "list" without repetitions. I put list between quotation marks because nay is the list type.

UPDATE: If maintaining the order of the initial list is important, the set nay is the solution. Instead, make a comprehension list with a cat jump

>>> lista = [1, 1, 2, 3, 4, 4, 2, 2]
>>> aux = set()
>>> lista_sem_repeticao = [x for x in a if not (x in aux or aux.add(x))]
>>> lista_sem_repeticao
[1, 2, 3]

The Counter also works with Strings

>>> lista = ["banana", "goiaba", "laranja", "banana", "banana", "goiaba"]

>>> qtd_frutas = Counter(lista)

>>> qtd_frutas
Counter({'banana': 3, 'goiaba': 2, 'laranja': 1})

I hope it helps

by lmonferrari • **3,550** points · Answer 2 · 2020-10-20T17:35:14+00:00

1

Using your code/reasoning, you can use a "filter" (new list):

n = int(input())
lista = []

for c in range(n):
     lista.append(int(input()))

filtro = []
for i in lista:
    if i not in filtro:
        filtro.append(i)
        
for v in filtro:
    print(v, lista.count(v))

Entree:

Exit:

3 3
2 2

The filter will contain the first occurrence of the numbers in the order they appear.

Hi, this set is reversing the order of the 2 with the 3, has like it does not make it?

– Gabriel Schrader

2020/10/20 at 17:58
@Gabrielschrader Um set does not guarantee the order of the elements, the very documentation says he’s a "unordered Collection" ("cluttered collection"). The only guarantee is that it has no repeatable elements. Hence the suggestion to use reversed is a gambiarra, which will "work" sometimes by coincidence. For example, if the list is [3, 3, 2, 2, 4, 4], the result of list(set(lista)) is [2, 3, 4], use reversed will be [4, 3, 2] (that is, none of the alternatives preserves the original order).

– hkotsubo

2020/10/20 at 18:39
It is worth remembering that this resulting order is internal detail of implementation: as a set does not guarantee the order, it may well change its implementation from one version to another and the result is different (it just needs to ensure that there will be no repeat, whatever the order). Make any algorithm that depends on the order that the set returns will be subject to error.

– hkotsubo

2020/10/20 at 18:40
Hi, thanks for the info! I’m new in the world of programming and EAD classes are hard to learn... Can you tell me if there’s another way to remove the repeaters and keep order without the set? @hkotsubo Thank you so much!!

– Gabriel Schrader

2020/10/20 at 18:41
The original question did not contain the requirement for such ordination. The reversed was gambiarra itself.

– lmonferrari

2020/10/20 at 18:41
@Gabrielschrader, see my reply update to keep order

– Paulo Marques

2020/10/20 at 19:02
@Paulomarques was perfect! Thanks!

– Gabriel Schrader

2020/10/20 at 19:37

Show 2 more comments