Create a HashMap
of String
and integers scan the text by adding words to the map if they are not in the same.
The advantage of the map is precisely that it is a structure that operates with keys (in this case the words) and values (integers).
This way just increment the value, if it is already on the map.
- Check between all map entries which have the highest frequency.
Sample code:
import java.util.HashMap;
import java.util.Map;
public class FreqPalavra {
private static final String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam viverra nec consectetur ante hendrerit. Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean ut gravida lorem. Ut turpis felis, pulvinar a semper sed, adipiscing id dolor. Pellentesque auctor nisi id magna consequat sagittis. Curabitur dapibus enim sit amet elit pharetra tincidunt feugiat nisl imperdiet. Ut convallis libero in urna ultrices accumsan. Donec sed odio eros. Donec viverra mi quis quam pulvinar at malesuada arcu rhoncus. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. In rutrum accumsan ultricies. Mauris vitae nisi at sem facilisis semper ac in est."
+ "Vivamus fermentum semper porta. Nunc diam velit, adipiscing ut tristique vitae, sagittis vel odio. Maecenas convallis ullamcorper ultricies. Curabitur ornare, ligula semper consectetur sagittis, nisi diam iaculis velit, id fringilla sem nunc vel mi. Nam dictum, odio nec pretium volutpat, arcu ante placerat erat, non tristique elit urna et turpis. Quisque mi metus, ornare sit amet fermentum et, tincidunt et orci. Fusce eget orci a orci congue vestibulum. Ut dolor diam, elementum et vestibulum eu, porttitor vel elit. Curabitur venenatis pulvinar tellus gravida ornare. Sed et erat faucibus nunc euismod ultricies ut id justo. Nullam cursus suscipit nisi, et ultrices justo sodales nec. Fusce venenatis facilisis lectus ac semper. Aliquam at massa ipsum. Quisque bibendum purus convallis nulla ultrices ultricies. Nullam aliquam, mi eu aliquam tincidunt, purus velit laoreet tortor, viverra pretium nisi quam vitae mi. Fusce vel volutpat elit. Nam sagittis nisi dui."
+ "Suspendisse lectus leo, consectetur in tempor sit amet, placerat quis neque. Etiam luctus porttitor lorem, sed suscipit est rutrum non. Curabitur lobortis nisl a enim congue semper. Aenean commodo ultrices imperdiet. Vestibulum ut justo vel sapien venenatis tincidunt. Phasellus eget dolor sit amet ipsum dapibus condimentum vitae quis lectus. Aliquam ut massa in turpis dapibus convallis. Praesent elit lacus, vestibulum at malesuada et, ornare et est. Ut augue nunc, sodales ut euismod non, adipiscing vitae orci. Mauris ut placerat justo. Mauris in ultricies enim. Quisque nec est eleifend nulla ultrices egestas quis ut quam. Donec sollicitudin lectus a mauris pulvinar id aliquam urna cursus. Cras quis ligula sem, vel elementum mi. Phasellus non ullamcorper urna."
.replaceAll("[.,]", "");
public static void main(String[] args) {
Map<String, Integer> mapaFreq = new HashMap<>();
// Cria o mapa de Frequências
for (String palavra : LOREM_IPSUM.split("\\s+")) {
if (!mapaFreq.containsKey(palavra)) {
mapaFreq.put(palavra, 1);
} else {
mapaFreq.put(palavra, 1 + mapaFreq.get(palavra));
}
}
// Arrays para armazenar os 3 valores mais frequentes.
String[] palavrasMaisFrequentes = new String[3];
int[] freqPalavras = new int[3];
//Percorre todos os valores do mapa
for (Map.Entry<String, Integer> entrada : mapaFreq.entrySet()) {
//Se achar algo mais frequente que a primeira posição
if (entrada.getValue() >= freqPalavras[0]) {
freqPalavras[0] = entrada.getValue();
palavrasMaisFrequentes[0] = entrada.getKey();
} else {
if (entrada.getValue() >= freqPalavras[1]) {
freqPalavras[1] = entrada.getValue();
palavrasMaisFrequentes[1] = entrada.getKey();
} else if (entrada.getValue() >= freqPalavras[2]) {
freqPalavras[2] = entrada.getValue();
palavrasMaisFrequentes[2] = entrada.getKey();
}
}
// System.out.println(entrada.getKey() + "/" + entrada.getValue()); imprime todo o mapa
}
for (int i = 0; i < freqPalavras.length; i++) {
System.out.println(i + 1 + " palavra: " + palavrasMaisFrequentes[i]
+ " \nFrequência: " + freqPalavras[i]
+ "\n------------------------\n");
}
}
}
You can see this example running in the ideone.
Observing
To simplify the above code does not pass the value to the other positions of the array (which is a logic error, because if a term is more frequent it should update the cascade array)
freqPalavras[] | palavrasMaisFrequentes[]
13 | "estouro"
11 | "da"
09 | "pilha"
If a word "Batman" often 14 appears the new order should be:
freqPalavras[] | palavrasMaisFrequentes[]
14 | "batman"
13 | "estouro"
11 | "da"
But in the above program it would be:
freqPalavras[] | palavrasMaisFrequentes[]
14 | "batman"
11 | "da"
09 | "pilha"
I don’t quite understand how it does the query of how many times the word has appeared: int x = how many.get("foo"); //inside the quotes can I pass any word that it will return the value? how many.put("foo", x+1); // if 0, it will do 0+1?
– Pacíficão
@That’s right. And do not need to put a literal, can be a variable too:
String palavra = "bar"; int x = quantas.get(palavra);
. One only has to be careful if the word does not yet exist in the dictionary - the result of theget
will benull
(and will provoke aNullPointerException
by doing the cast forint
). So it might be good to test before if the word is already there (if ( quantas.containsKey(palavra) ) { ... }
) and, if not, put it in. The second part is also right - whatever the value ofx
, you can do an operation with it and save the result back onMap
.– mgibsonbr
Are you sure how many.containsKey(word) works? Because it always returns null to me.
– Pacíficão
@Peace Yes, it works... See an example in ideone
– mgibsonbr
I managed to make Map<> work the way you said... but I didn’t understand how to join what I did with the priority queue. You can explain to me better? :)
– Pacíficão
If you filled out the
Map
, then you have a collection ofMap.Entry
(i.e. pairs ofString
andInteger
). To get the 3 most appearing words, simply sort a list or array ofMap.Entry
increasingly byInteger
, and take the last three elements, but that’s kind of a waste, because ordering isn*log(n)
and you will throw away most of the elements. The idea of the priority queue is to precisely eliminate this waste. You add 3 elements into it, then every new element you add you remove the smallest. Example in the ideone– mgibsonbr
I understood.. Now another question, how can I sort the values. In the Ideone example it returns me: [b=20, c=30, e=25].. How can I make him return me: [c=30, ,e=25 b=20] ?
– Pacíficão
@Peace Put in a list and order. Or - if you want to reuse the already written comparator - order in ascending order even and then invert. I updated the example. If you have any further questions, I suggest you open up a new question before it becomes a "chameleon question"... :P
– mgibsonbr
Just one last question: pq the number 4 here: Priorityqueue<Map.Entry<String,Integer>> queue = new Priorityqueue<Map.Entry<String,Integer>>(4, comp);
– Pacíficão
@Peace This number is not required, it says what is the initial capacity of the queue. As we know that the most elements it will contain will be 4 (the 3 you want and 1 more you are testing against all others) so putting this value ensures you will have enough "space" for them [without the data structure having to resize to accommodate others] and doesn’t waste memory, by just allocating the space that will really be needed. It’s a micro-optimization, and normally it wouldn’t even be necessary, but as the builder who gets a
Comparator
request this parameter, I gave the most indicated– mgibsonbr