Sort dictionary and add python values

Asked

Viewed 192 times

2

I have this text file that is processed to capitalize and this part does correctly.

olá meu nome é meu nome pois eu olá
é meu nome walt não disney
olá

Then I have this function which should be able to calculate the frequency of each word (and does it as it should). And then you should sort the list dataFreq and make the calculation of what the probability of a given word appears in the text. That is, in this way: frequenciaPalavra/totalPalavras

def countWordExact(dataClean):

    count = {}
    dataFreq = []
    global total

    for word in dataClean.splitlines():
        for word in word.split(" "):
            if word in count:
                count[word] += 1
            else:
                count[word] = 1
            total += 1

    dataFreq.append(count)

    freq = []

    for indice in sorted(count, key=count.get):
        #print(count[indice])
        freq.append((count[indice])/total)
    #print(freq)

    return dataFreq

My question is: how to order the dictionary (consecutively the list) and add to this the values resulting from the calculation of the frequency indicated above? Take the example:

[{'olá': 0.12, 'meu': 0.12, 'nome': 0.132, 'é': 0.12321, 'pois': 0.56, 'eu': 0.65, 'walt': 0.7, 'não': 0.7, 'disney': 0.5}]

(the above brake values are wrong)

1 answer

2


All logic of calculating frequency is already implemented natively in Python at collections.Counter, the only thing you need to do is divide the frequency the word appears in the text by the total amount of words:

from collections import Counter

texto = """
olá meu nome é meu nome pois eu olá
é meu nome walt não disney
olá
"""

palavras = texto.split()
frequencias = Counter(palavras)
# Counter({'olá': 3, 'meu': 3, 'nome': 3, 'é': 2, 'pois': 1, 'eu': 1, 'walt': 1, 'não': 1, 'disney': 1})

To calculate the percentage:

total = len(palavras)
probabilidades = {}

for palavra, frequencia in frequencias.items():
    probabilidades[palavra] = frequencia/total

print(probabilidades)

Resulting in:

{'olá': 0.1875, 'meu': 0.1875, 'nome': 0.1875, 'é': 0.125, 'pois': 0.0625, 'eu': 0.0625, 'walt': 0.0625, 'não': 0.0625, 'disney': 0.0625

Or in summary form:

probabilidades = {palavra: frequencia/total for palavra, frequencia in frequencias.items()}
  • thx, optimized my code. But how do I save this last print?

  • @Walt057 I edited the answer by building a dictionary of probabilities

Browser other questions tagged

You are not signed in. Login or sign up in order to post.