According to the documentation, one Counter
is a subclass of dict
, and so it’s also a dictionary. So it’s possible to check if an element exists and get it, just like we do with dictionaries.
To get the letters that are not in the Counter
, just check if they are dictionary keys. One option is to use in
:
from collections import Counter
counter = Counter()
with open('hino.txt', 'r', encoding='utf8') as f:
for linha in f:
counter.update(linha)
from string import ascii_letters
acentos = 'áéíóúãõâêîôûç'
todas_letras = ascii_letters + acentos + acentos.upper()
for letra in todas_letras:
if letra not in counter:
print(f'{letra}: 0')
I changed the way to read the file a little. Of course to use read()
also works, but this method loads all file contents into memory. If the file is small it makes no difference, but for larger files, it can be interesting to read it line by line so as not to waste memory aimlessly (and the for linha in f
does this, reads the file one line at a time and then discards it, instead of loading the entire file at once).
Then I created a string containing all the letters. Like yours Counter
is differentiating capital letters, lowercase and accented, I am assuming that A
, a
, á
, Á
, ã
, Ã
, etc, are different letters. If you want a different definition, just change the string todas_letras
to have just what you need.
Then I make one loop by letters and only print those that are not in the Counter
.
Another option is to use set
:
counter = ... # cria o Counter
from string import ascii_letters
acentos = 'áéíóúãõâêîôûç'
todas_letras = set(ascii_letters + acentos + acentos.upper())
for letra in sorted(todas_letras - set(counter.keys())):
print(f'{letra}: 0')
First I create a set
all letters, and subtraction from another set
containing only the keys of the Counter
. The result is another set
, containing the letters that are not in the Counter
. I use sorted
to return the letters in alphabetical order, since a set
does not guarantee the order of the elements (remembering that the accented letters will be at the end, but once you have the letters, you can show the way you want).
If you want to show how many times a certain character occurs, just use it as a key. For example, for space:
# imprime quantas vezes o espaço ocorreu
print(counter[' '])
For the most frequent letter, you have to get the most frequent elements with most_common
, and go through them until you find a letter:
# busca a letra mais frequente
for c, qtd in counter.most_common():
if c.isalpha(): # encontrei uma letra
print(f'Letra {c} ocorre {qtd} vezes')
break # se já encontrei, interrompo o loop
I did so because I do not know how your text is, and it goes that the most frequent characters are not letters (they can be space, punctuation marks, numbers, etc). So I prefer to scroll through the most common characters until I find a letter. And when I do, I interrupt the loop with break
.
But this only shows one letter. What if you have a tie? In this case, you can take the most frequent letter, and print all that have the same amount:
letras = list(filter(lambda c: c[0].isalpha(), counter.most_common()))
maior_qtd = letras[0][1]
print(f'Letras mais comuns, ocorrem {maior_qtd} vezes')
for c, qtd in letras:
if qtd != maior_qtd:
break # não é a mais frequente, interrompe o loop
print(c)
First I use filter
to generate a list containing only the letters of Counter
(and their respective quantities). Then take the amount of the first element (i.e., the most frequent letter).
Then I go through the list and see if the letters occur in the same amount of times as the most frequent. When it has a different value, I interrupt the loop.
Thus, if two or more letters are the most frequent, they will all be shown.