Do word search inside a list and return tuples sorted in PYTHON

Asked

Viewed 991 times

3

The algorithm should receive a string, count how many words equal and return a list of tuples with the words that most appear in the string and how many times it appears. The problem is that in searches it is equal starting words it counts more often. Like: "but" and "Butter", it counts but 3X and Butter 2X. "Betty bought a bit of Butter but the Butter was Bitter"

I still wish to order first by the words that appear more and if an equal number of times appear, by alphabetical order of words. Type: "Falling" and "down", both appear 4X, so in the output sort first "down" and then "Falling". "London bridge is Falling down Falling down Falling down London bridge is Falling down my fair lady"

def count_words(s, n):   
top_n = []
itens = n
words = s.split()
pref = words
for p in pref:
    cont = 0
    for w in words:
        if w.startswith(p):
            cont+=1
    if (p, cont) not in top_n:
        top_n.append((p, cont))
top_n.sort(key = lambda t:t[1], reverse = True)
#from operator import itemgetter
#sorted(top_n, key = itemgetter(1), reverse = True)
while len(top_n) > itens:
    del top_n[len(top_n)-1]    
return top_n

def test_run():
    print count_words("cat bat mat cat bat cat", 3)
    print count_words("betty bought a bit of butter but the butter was bitter", 3)
    print(count_words("london bridge is falling down falling down falling down london bridge is falling down my fair lady", 5))

if __name__ == '__main__':
    test_run()

2 answers

2

def count_words(s, n):   
    top_n = []
    itens = n
    words = s.split()
    top_n = dict([])
    for w in words:
        if w not in top_n.keys(): top_n[w] = 0
        top_n[w] +=1
    top_n = list(top_n.items())
    #print(top_n)
    top_n.sort(key = lambda t:(-t[1],t[0]), reverse = False)
    return top_n[:n]

def test_run():
    print(count_words("cat bat mat cat bat cat", 3))
    print(count_words("betty bought a bit of butter but the butter was bitter", 3))
    print(count_words("london bridge is falling down falling down falling down london bridge is falling down my fair lady", 5))


if __name__ == '__main__':
    test_run()

I think it becomes clearer, the idea here is to use the words as keys of a dictionary, thus making the count. The problem of ordination is that you want it to be ordered down by the count and increasing by the word, what I did there was turn the number into negative -t[1], first by the negative version of the number and then by the word, now decreasing.

  • Take a look at the collections.Counter also -

2

from operator import itemgetter
import re

sentence = 'london bridge is falling down falling down falling down london bridge is falling down my fair lady'

def count_words(text):
    words = re.findall(r'\w+', text)    
    wordsCount = [(words.count(word), word) for word in set(words)]        
    wordsCount.sort(key=itemgetter(1)) #order by word
    wordsCount.sort(key=itemgetter(0), reverse=True) #order by wordcount   
    return wordsCount

print(count_words(sentence))

Result: [[4, 'down'], [4, 'Falling'], [2, 'bridge'], [2, 'is'], [2, 'London'], [1, 'fair'], [1, 'lady'], [1, 'my']]

The above function uses re.findall to locate the words and then performs the word count, returned in the wordsCount list, a sequence of lists with each of the words in the text and the respective number of occurrences. In sequence we use the function Sort to order the list, first alphabetically and then according to the number of occurrences, highlighting that the function Sort persists the alphabetical ordering performed in the first step.

A very readable solution and in a few lines.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.