I wonder if it’s possible to decrease my code?

Asked

Viewed 102 times

0

I was doing the following exercise:

Write a program that reads a file and shows the letters in descending order of frequency. Your program should convert all entries to Low Box and only count the letters from a to z. Do not count spaces, digits, scores, or anything other than an alphabet letter. Find simple texts from many different languages and see how the frequency of letters varies between languages.

I came to this code and would like to know how I can simplify it.

arquivo= input("Insira o endereço de arquivo: ").strip('"')
texto= open(arquivo)
frase=[]
palavra= []
ordem= []
l= []
letras= {}

# Lê cada linha do texto
for line in texto:
    line.rstrip()
    frase= line.strip()
    
#Lê cada palavra da linha
    for i in range(len(frase)):
        palavra= frase[i].lower()
        
#Lê cada letra da palavra
        for j in range(len(palavra)):
            if palavra[j] == ' ' or palavra[j] == '\n' : continue
            letras[palavra[j]]= letras.get(palavra[j], 0)+1

#Organiza tudo em uma lista de tuples
for k,v in letras.items():
    l.append((v,k))
    
l.sort(reverse=True)
for k,v in l:
    print(k,v)

1 answer

2


First you could open the file within a block with, that ensures that the file will be closed at the end.

Then you’re making one loop for nothing. The for line in texto runs through every line in the archive, and line will be a string. As you want to count the letters, just go through this string with another for and check if each character is a letter (there is no need to separate the line into words, because you do not want to count the words but the letters, so it does not matter if the letters are in one word or another).

And to sort keys based on quantity, just use the parameter key (do not need to create another structure for this). Would look like this:

arquivo = # ler nome do arquivo
letras = {}
with open(arquivo) as arq:
    for line in arq: # para cada linha do arquivo
        for char in line.lower(): # para cada caractere da linha
            if 'a' <= char <= 'z': # se for letra
                letras[char] = letras.get(char, 0) + 1

# ordena o dicionário 
for letra in sorted(letras, key=letras.get, reverse=True):
    print(letra, letras[letra])

The ordination uses key=letras.get, which indicates that I will use the value (not the key) of the dictionary for sorting.


Another alternative is to use a Counter, which serves precisely to count occurrences of elements:

from collections import Counter

letras = Counter()
with open(arquivo) as arq:
    for line in arq:
        for char in line.lower():
            if 'a' <= char <= 'z':
                letras.update(char)

for letra, qtd in letras.most_common():
    print(letra, qtd)

To order, just use most_common(), that returns a list of tuples, containing the letter and its amount, already ordered from the most common to the least common.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.