0
I have a problem where I need to compare Strings and define their proximity.
The problem is that I have a list with 21,000 records and I need to compare them all, with each other, which would give a quadratic complexity and in total would be 21,000*21,000 comparisons, which doesn’t take long. I need to do this faster and more practical.
I am inserting these records into a database for easy manipulation but I make some changes and filters in it through Java programming, before inserting. Below is an excerpt from the code:
for (int i = 0; i < limite; i++) {
for (int j = 0; j < limite; j++) {
if (searchForItem(listaoComp, listao.get(j).id, listao.get(i).id)) {
naoInsere++;
} else {
insere++;
listaoComp.add(new Comparacao(listao.get(i).id, listao.get(j).id));
}
total++;
}
}
Basically, I’m doing the quadratic check. As this happens, for example, I am comparing the records (0.1) and (1.0) and entering them in the list. In my case, records like this mean the same thing and do not need to be both in the list so searchForItem checks before entering (1,0) if already exists (0,1) in the list. The searchForItem does this. This method gets worse throughout the execution of the code as the list increases considerably.
To contextualize them, I have a list of active drug principles (id and name) that form the Principle class, which the list is a list of this object.
List<Principio> listao = new ArrayList<Principio>();
The Comp list is a Comparison list.
In the Comparison class, I keep the ids of the compared active principles and their proximity, for example
new Comparacao(1, 2, 98.5);
Ah, and the variable boundary within For’s indicate the size of the list.
int limite = listao.size()
As said earlier, I’m comparing all ids twice, for example 0 - 0 / 0 - 1 / 0 - 2 / 1 - 0 / 1 - 1 / 1 - 2 / 2 - 0 / 2 - 1 / 2 - 2
searchForIt looks in the Comp list for the record 0 - 1 before entering 1 - 0, you understand?
I was trying to use the Java 8 Stream Parallels but I don’t know how to use the right tool and I couldn’t assemble this logic in the Filters. (I don’t even know if that’s where I should set the terms.)
My question is if anyone has ever been through something like this and has some non-quadratic algorithm to do it and if they could point me in a direction.
I didn’t get that
searchForItem
nor this classComparacao
. The type oflistao
isList<String>
? What islistaoComp
?– Victor Stafusa
It would not be simpler to create a class that represents its object, to implement the methods in it
hashcode
andequals
with its definition of "equal" and then store them in aSet
?– Felipe Marinho
How many elements do you want to have in
listaoComp
as a result? The way you describe your problem, it’s unclear if it’s 21,000, if it’s 441,000,000, if it’s 210,000,000, or if it’s something else.– Victor Stafusa
After understanding your problem, the solution should be simple, and unless you really need a quadratic number of results in your
listaoComp
, there must be a simple solution O(n log n) or O(n), probably using aSet
or aMap
. The problem is that it is difficult to understand what it is you are trying to do exactly. Put in your question the code ofsearchForItem
and ofComparacao
that should get easier. And describe the type oflistao
(given that.id
, then it’s notString
, but then I don’t know what it is).– Victor Stafusa
I highlighted in the text that I wrote what searchForItem does. The Comparison class stores the ids of the objects compared and the proximity value between them. In this case the class has three attributes: id1, id2 and name. These ids refer to the positions of the list of objects I am checking. The list holds an object that will be compared. I can not say the exact number of elements but 441 million would be all possible comparisons, I believe the correct number would be around 50~52%.
– Lucas Soares
OK I will change the question and await feedback.
– Lucas Soares