Average frequency using dictionary

Asked

Viewed 226 times

0

I want to check the frequency of words returned in a book (format .txt) and divide this frequency by the number of chapters found, thus obtaining the average word appearance per chapter.

The point is that I want to return this result within a dictionary. Example of expected return:

{'casa': 4.9000090}
{'homem': 2.256535434}

I already managed to get the expected result without using the dictionary as a return, but I would like to concatenate the information to stay the way above.

Someone to point out the best way?

My code is like this:

#Código do meu arquivo auxiliar.py que contém as funções de leitura de texto
def abre_texto():
    return open('livroExemplo.txt', 'r')

def consulta_novo_capitulo(line)
    return bool(re.search(r'^Capítulo\s\d\d', line))



#Função principal
import auxiliares as aux

def media_palavras(palavra):
    thisdict = {}
    newChapter = 0
    for line in aux.abre_texto():
        if aux.consulta_capitulo(line) == True:
            newChapter += 1
        for word in line.split():
            if word not in thisdict:
                thisdict[word] = 1
            else:
                thisdict[word] += 1
    print("Existem",newChapter,"capítulos.")
    print("A palavra consultada {",palavra,"} apareceu",thisdict[palavra],"vezes no livro.")
    print("Média de aparição da palavra consultada:",(thisdict[palavra])/newChapter)

media_palavras('casa')

1 answer

0

I don’t know if I understood it right, but if I had to count the words of a text I would use collections and, in the end, would mount the dictionary with the result, I will give an example generating (randomly) a list of 800 words from another list of only 11, let’s consider that the book has 3 chapters and that the generated list (of 800 word) would be the result of the extraction of the text from the book:

from collections import Counter
import numpy as np

# Palavras das quais será gerada a lista simulando o resultado da extração do texto
palavras = ['madri', 'abrigo', 'fila', 'menino', 'soprano', 'nó', 'engolir',
            'dentro','caverna', 'percepção', 'flash']

# Lista simulando as palavras lidas do texto
livro  = np.random.choice(palavras, 800, replace=True)

# Conta a frequencia de cada palavra na lista
counter = Counter(livro)

# Dicionario vazio para o resultado das médias
medias = {}

# Calculo das médias (considerando 3 capítulos)
for item in counter.items():
  medias[item[0]] = round(item[1]/3,2)

# Apresentando o resultado
print(medias)

Exit:

{'abrigo': 26.67,
 'menino': 24.33,
 'engolir': 24.67,
 'soprano': 30.0,
 'nó': 24.67,
 'flash': 26.0,
 'dentro': 20.0,
 'percepção': 20.67,
 'fila': 24.67,
 'caverna': 25.0,
 'madri': 20.0}

You can, if you want, present a more complete dictionary, which would contain a 'subditionalism' for each word, showing the total frequency and the average frequency:

# Resultado completo
completo = {}
for item in counter.items():
  completo[item[0]] = {'Total':item[1], 'Media': round(item[1]/3,2)}

print(completo)

Exit:

{'abrigo': {'Total': 80, 'Media': 26.67},
 'menino': {'Total': 73, 'Media': 24.33},
 'engolir': {'Total': 74, 'Media': 24.67},
 'soprano': {'Total': 90, 'Media': 30.0},
 'nó': {'Total': 74, 'Media': 24.67},
 'flash': {'Total': 78, 'Media': 26.0},
 'dentro': {'Total': 60, 'Media': 20.0},
 'percepção': {'Total': 62, 'Media': 20.67},
 'fila': {'Total': 74, 'Media': 24.67},
 'caverna': {'Total': 75, 'Media': 25.0},
 'madri': {'Total': 60, 'Media': 20.0}}

Now just adapt to your context and it should work.

See working on repl.it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.