Display higher word frequency per line and calculate number of words per line

Asked

Viewed 109 times

1

I have the following text and I have to display the words that appear most frequently per line and calculate the amount of words each line

This is a really really really cool experiment really
Cute little experiment
Will it work maybe it will work do you think it will it will

The above text is in a file test.txt. I opened the file, read the data to a string and then I went to a hash

File.foreach ("teste.txt") do |linha|
      a = linha.split
      p a
      b = Hash[a.group_by(&:itself).map { |word, words| [word, words.size]}]
      puts b
    end

Having done this I got the following results (this interspersed the string with the hash):

["This", "is", "a", "really", "really", "really", "cool", "experiment", "really"]
{"This"=>1, "is"=>1, "a"=>1, "really"=>4, "cool"=>1, "experiment"=>1}
["Cute", "little", "experiment"]
{"Cute"=>1, "little"=>1, "experiment"=>1}
["Will", "it", "work", "maybe", "it", "will", "work", "do", "you", "think", "it", "will", "it", "will"]
{"Will"=>1, "it"=>4, "work"=>2, "maybe"=>1, "will"=>3, "do"=>1, "you"=>1, "think"=>1}

At this point I’m stuck because I’m not able to iterate on the hash, when I find the key and the value "really"=>4 From the first line I’m not being able to print the most frequent word from the bottom line (although they are all at once). I also don’t know if this is the best way to accomplish this process, but it was what I thought to try to solve.

1 answer

1


You are on the right track! First of all you need to group the values, for this I will use the Enumerable#each_with_object.

frase = 'hustle hustle talent'
# precisa do split para obter ['hustle', 'hustle, 'talent']
frequencias = frase.split.each_with_object(Hash.new(0)) { |palavra, hash| hash[palavra] += 1 }

So far it will work as you are already working. To know which word has the highest frequency is not difficult, see:

frequencias.max_by { |key, value| value }
#=> ["hustle", 2]

To print just store the values returned by Enumerable#max_by. It returns an array with the key and the value.

maior_ocorrencia = frequencias.max_by { |key, value| value }
puts "A palavra que mais aparece é: #{maior_ocorrencia[0]} (#{maior_ocorrencia[1]})"

So just iterate and use the same logic.

See working on repl.it. I think it’s cleared up better, anything leaves a comment.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.