Count the most popular words

Asked

Viewed 3,778 times

2

I’m trying to find the number of occurrences of a list of words in a text:

from collections import Counter

def popularidade (texto, palavras):

    texto = texto.lower().split()
    palavras = palavras.lower().split()

    lista = []

    for p in palavras:
        for t in texto:
            if p == t:
                lista.append(t)
                return Counter(lista)

print(popularidade("Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:", "nos, a, preste"))

Upshot:

Counter({'preste': 1})

Desired result:

{''nos': 4, 'a': 2, 'preste': 1}

  • 2

    You are returning from the function always in the first word found. Try removing the return from within the repeat loop. To better understand what your code does, take the table test.

3 answers

3


Actually, there are two things that you’re not aware of,

The Return causes the function to return (stop the execution) right after that line, and there is a detail that is escaping you, the commas of the words, which make the check does not return true, ex: 'nos' == 'nos,' = False.

Your code is corrected:

from collections import Counter

def popularidade (texto, palavras):

    texto = texto.lower().split()
    palavras = palavras.lower().replace(',', '').split() # tirar virgulas

    lista = []

    for p in palavras:
        for t in texto:
            if p == t:
                lista.append(t)
    return Counter(lista) # return quando todas as palavras verificadas

print(popularidade("Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:", "nos a preste"))

DEMONSTRATION

To tell you the truth you don’t need 'so much happening' or Counter():

palavras = "nos, a, preste"
texto = "Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:"
palavras_spl = palavras.lower().replace(',', '').split()
text_spl = texto.lower().split()
count = {p: text_spl.count(p) for p in palavras_spl if p in text_spl}
print(count) # {'preste': 1, 'a': 2, 'nos': 4}

DEMONSTRATION

Sequisers completely remove the punctuation of both, so as to ensure that both are left with only words:

import string

palavras = "nos, a, preste"
texto = "Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:"

palavras_spl = palavras.translate(palavras.maketrans('','',string.punctuation)).lower().split()
text_spl = texto.translate(texto.maketrans('','',string.punctuation)).lower().split()
count = {p: text_spl.count(p) for p in palavras_spl if p in text_spl}
print(count) # {'preste': 1, 'a': 2, 'nos': 4}

DEMONSTRATION

  • 2

    Although the OP example does not address these issues, I believe that scores in the original text may also affect the result (e. g., tarefa, vs tarefa). I recommend clearing the scores: texto_limpo = texto.translate(None, string.punctuation).

  • 1

    It’s true @Anthonyaccioly, good tip, I didn’t remember . obgado, I will complete with this info

2

def frequencia(texto):
    frequencia_por_palavra = [texto.count(p) for p in texto]
    return dict(zip(texto, frequencia_por_palavra))

def popularidade(texto, palavras):
    dFrequencia = frequencia(texto)
    return dict((k, dFrequencia[k]) for k in palavras if k in dFrequencia)

print(popularidade(open('texto.txt').read().split(), ['filhos', 'amada']))

txt text contains the Brazilian anthem

Upshot:

{'amada': 4, 'filhos': 2}

Option 2

If you like regular expressions, you can also do so:

def popularidade(texto, palavra):
    import re
    return sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(palavra), texto))

palavras = "nos, a, preste"
texto = "Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:"
d = dict( (v, popularidade(texto, v)) for v in palavras.split(",") )
print(d)

2

We got two problems there, buddy.

  1. Code indentation is wrong on your Return line. It should be referenced to the 1st 'FOR' and not to the IF, so that it can return only after the complete completion of its list

  2. As you have commas together the words at the time of Split it separates the words together with the commas (palavra = ['nos,' , 'a,' , 'preste' ] ), because of this he does not find these words in the text.

The correct code in this case would be:

from collections import Counter

def popularidade (texto, palavras):

    texto = texto.lower().split()

    palavras = palavras.lower().split()


    lista = []

    for p in palavras:
        for t in texto:
            if p == t:
                lista.append(t)

    return Counter(lista)

print(popularidade("Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:", "nos a preste"))    

Result: Counter({'nos': 4, 'a': 2, 'preste': 1})

Browser other questions tagged

You are not signed in. Login or sign up in order to post.