Average frequency using dictionary


I want to check the frequency of words returned in a book (format .txt) and divide this frequency by the number of chapters found, thus obtaining the average word appearance per chapter.

The point is that I want to return this result within a dictionary. Example of expected return:

{'casa': 4.9000090}
{'homem': 2.256535434}

I already managed to get the expected result without using the dictionary as a return, but I would like to concatenate the information to stay the way above.

Someone to point out the best way?

My code is like this:

#Código do meu arquivo auxiliar.py que contém as funções de leitura de texto
def abre_texto():
    return open('livroExemplo.txt', 'r')

def consulta_novo_capitulo(line)
    return bool(re.search(r'^Capítulo\s\d\d', line))

#Função principal
import auxiliares as aux

def media_palavras(palavra):
    thisdict = {}
    newChapter = 0
    for line in aux.abre_texto():
        if aux.consulta_capitulo(line) == True:
            newChapter += 1
        for word in line.split():
            if word not in thisdict:
                thisdict[word] = 1
                thisdict[word] += 1
    print("A palavra consultada {",palavra,"} apareceu",thisdict[palavra],"vezes no livro.")
    print("Média de aparição da palavra consultada:",(thisdict[palavra])/newChapter)


I don’t know if I understood it right, but if I had to count the words of a text I would use collections and, in the end, would mount the dictionary with the result, I will give an example generating (randomly) a list of 800 words from another list of only 11, let’s consider that the book has 3 chapters and that the generated list (of 800 word) would be the result of the extraction of the text from the book:

from collections import Counter
import numpy as np

# Palavras das quais será gerada a lista simulando o resultado da extração do texto
palavras = ['madri', 'abrigo', 'fila', 'menino', 'soprano', 'nó', 'engolir',
            'dentro','caverna', 'percepção', 'flash']

# Lista simulando as palavras lidas do texto
livro  = np.random.choice(palavras, 800, replace=True)

# Conta a frequencia de cada palavra na lista
counter = Counter(livro)

# Dicionario vazio para o resultado das médias
medias = {}

# Calculo das médias (considerando 3 capítulos)
for item in counter.items():
  medias[item[0]] = round(item[1]/3,2)

# Apresentando o resultado


{'abrigo': 26.67,
 'menino': 24.33,
 'engolir': 24.67,
 'soprano': 30.0,
 'nó': 24.67,
 'flash': 26.0,
 'dentro': 20.0,
 'percepção': 20.67,
 'fila': 24.67,
 'caverna': 25.0,
 'madri': 20.0}

You can, if you want, present a more complete dictionary, which would contain a 'subditionalism' for each word, showing the total frequency and the average frequency:

# Resultado completo
completo = {}
for item in counter.items():
  completo[item[0]] = {'Total':item[1], 'Media': round(item[1]/3,2)}



{'abrigo': {'Total': 80, 'Media': 26.67},
 'menino': {'Total': 73, 'Media': 24.33},
 'engolir': {'Total': 74, 'Media': 24.67},
 'soprano': {'Total': 90, 'Media': 30.0},
 'nó': {'Total': 74, 'Media': 24.67},
 'flash': {'Total': 78, 'Media': 26.0},
 'dentro': {'Total': 60, 'Media': 20.0},
 'percepção': {'Total': 62, 'Media': 20.67},
 'fila': {'Total': 74, 'Media': 24.67},
 'caverna': {'Total': 75, 'Media': 25.0},
 'madri': {'Total': 60, 'Media': 20.0}}

Now just adapt to your context and it should work.

See working on repl.it.

