Print letters that are outside of Collections. Counter

Asked

Viewed 82 times

-1

I made a code that returns the character recurrence of a file.

from collections import Counter

with open('hino.txt', 'r', encoding='utf8') as f:
    conteudo = f.read()
    counter = Counter(conteudo)
    print(counter)

And that was his way out:

Counter({' ': 138, 'e': 97, 'a': 87, 'o': 80, 'r': 65, 's': 54, 'd': 42, '\n': 38, 'i': 31, 't': 31, 'n': 28, 'm': 28, 'l': 27, 'u': 27, ',': 20, 'p': 15, 'b': 13, 'v': 10, 'c': 9, 'f': 8, 'E': 8, 'í': 6, 'z': 6, '!': 6, 'B': 6, 'ç': 5, 'S': 4, 'g': 4, 'P': 4, 'R': 4, 'q': 4, 'j': 4, 'Q': 4, 'D': 4, 'ã': 3, 'A': 3, '.': 3, 'C': 3, 'h': 3, 'á': 2, 'N': 2, 'T': 1, 'à': 1, 'é': 1, ';': 1, 'J': 1})

How would I print also the letters that do not have appear in the Counter result?

For example:

Y:0

Another question, how could I print the number of spaces and the letter with greater incidence?

Which would be the ' ':138 e 'e': 97

2 answers

1

According to the documentation, one Counter is a subclass of dict, and so it’s also a dictionary. So it’s possible to check if an element exists and get it, just like we do with dictionaries.


To get the letters that are not in the Counter, just check if they are dictionary keys. One option is to use in:

from collections import Counter

counter = Counter()
with open('hino.txt', 'r', encoding='utf8') as f:
    for linha in f:
        counter.update(linha)

from string import ascii_letters

acentos = 'áéíóúãõâêîôûç'
todas_letras = ascii_letters + acentos + acentos.upper()

for letra in todas_letras:
    if letra not in counter:
        print(f'{letra}: 0')

I changed the way to read the file a little. Of course to use read() also works, but this method loads all file contents into memory. If the file is small it makes no difference, but for larger files, it can be interesting to read it line by line so as not to waste memory aimlessly (and the for linha in f does this, reads the file one line at a time and then discards it, instead of loading the entire file at once).

Then I created a string containing all the letters. Like yours Counter is differentiating capital letters, lowercase and accented, I am assuming that A, a, á, Á, ã, Ã, etc, are different letters. If you want a different definition, just change the string todas_letras to have just what you need.

Then I make one loop by letters and only print those that are not in the Counter.

Another option is to use set:

counter = ... # cria o Counter

from string import ascii_letters

acentos = 'áéíóúãõâêîôûç'
todas_letras = set(ascii_letters + acentos + acentos.upper())

for letra in sorted(todas_letras - set(counter.keys())):
    print(f'{letra}: 0')

First I create a set all letters, and subtraction from another set containing only the keys of the Counter. The result is another set, containing the letters that are not in the Counter. I use sorted to return the letters in alphabetical order, since a set does not guarantee the order of the elements (remembering that the accented letters will be at the end, but once you have the letters, you can show the way you want).


If you want to show how many times a certain character occurs, just use it as a key. For example, for space:

# imprime quantas vezes o espaço ocorreu
print(counter[' '])

For the most frequent letter, you have to get the most frequent elements with most_common, and go through them until you find a letter:

# busca a letra mais frequente
for c, qtd in counter.most_common():
    if c.isalpha(): # encontrei uma letra
        print(f'Letra {c} ocorre {qtd} vezes')
        break # se já encontrei, interrompo o loop

I did so because I do not know how your text is, and it goes that the most frequent characters are not letters (they can be space, punctuation marks, numbers, etc). So I prefer to scroll through the most common characters until I find a letter. And when I do, I interrupt the loop with break.

But this only shows one letter. What if you have a tie? In this case, you can take the most frequent letter, and print all that have the same amount:

letras = list(filter(lambda c: c[0].isalpha(), counter.most_common()))
maior_qtd = letras[0][1]
print(f'Letras mais comuns, ocorrem {maior_qtd} vezes')
for c, qtd in letras:
    if qtd != maior_qtd:
        break # não é a mais frequente, interrompe o loop
    print(c)

First I use filter to generate a list containing only the letters of Counter (and their respective quantities). Then take the amount of the first element (i.e., the most frequent letter).

Then I go through the list and see if the letters occur in the same amount of times as the most frequent. When it has a different value, I interrupt the loop.

Thus, if two or more letters are the most frequent, they will all be shown.

0

import string
from collections import Counter

Picking up the ascii characters

asciiLetters = string.ascii_letters

Creating a dictionary with key(letter) and value(0)

dicionario = {key:0 for (key) in asciiLetters}

Opening the file and counting occurrences of letters

with open('hino.txt', 'r', encoding='utf8') as f:
    counter = Counter(f.read())

Updating the dictionary with the values Counter counted

dicionario.update(dict(counter))
dicionario

Here shows the most frequent characters

counter.most_common()[:3]

Code

import string
from collections import Counter

asciiLetters = string.ascii_letters
dicionario = {key:0 for (key) in asciiLetters}

with open('hino.txt', 'r', encoding='utf8') as f:
    counter = Counter(f.read())

dicionario.update(dict(counter))
dicionario

counter.most_common()[:3]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.