How to get the 10 most frequent words from in array?

Asked

Viewed 581 times

5

I need to know how to get the ten most frequent words.

This code takes all the words of a text and saves how many times it occurred.

if len(palavra) > 0:
   if palavra in conjunto:
     qtd = conjunto[palavra]
     qtd += 1
     conjunto[palavra] = qtd
  else:
     conjunto[palavra]

How do I return only the 10 most frequent occurrences?

1 answer

6

(TL;DR)

Collections:

import collections

# Lista de palavras
words = ['Banana', 'Maçã','Laranja', 
'Melão','Uva','Abacaxi','Abacate','Pimenta','Banana', 
'Maçã','Banana','Melão','Banana','Uva','Abacaxi','Fake','Fake']

# Contador para as ocorrencias de cada palavra
c = collections.Counter(words)

print (c)
Counter({'Banana': 4, 'Maçã': 2, 'Melão': 2, 'Uva': 2, 'Abacaxi': 2, 'Fake': 2, 
'Laranja': 1, 'Abacate': 1, 'Pimenta': 1})


# As 3 palavras mais frequentes
c.most_common(3)
[('Banana', 4), ('Melão', 2), ('Uva', 2)]

Run the code in repl.it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.