Efficient method for comparing lists

Asked

Viewed 748 times

0

I have a problem where I need to compare Strings and define their proximity.

The problem is that I have a list with 21,000 records and I need to compare them all, with each other, which would give a quadratic complexity and in total would be 21,000*21,000 comparisons, which doesn’t take long. I need to do this faster and more practical.

I am inserting these records into a database for easy manipulation but I make some changes and filters in it through Java programming, before inserting. Below is an excerpt from the code:

for (int i = 0; i < limite; i++) {
            for (int j = 0; j < limite; j++) {

               if (searchForItem(listaoComp, listao.get(j).id, listao.get(i).id)) {

                    naoInsere++;
                } else {

                    insere++;

                   listaoComp.add(new Comparacao(listao.get(i).id, listao.get(j).id));
               }
                total++;
            }

        }

Basically, I’m doing the quadratic check. As this happens, for example, I am comparing the records (0.1) and (1.0) and entering them in the list. In my case, records like this mean the same thing and do not need to be both in the list so searchForItem checks before entering (1,0) if already exists (0,1) in the list. The searchForItem does this. This method gets worse throughout the execution of the code as the list increases considerably.

To contextualize them, I have a list of active drug principles (id and name) that form the Principle class, which the list is a list of this object. List<Principio> listao = new ArrayList<Principio>();

The Comp list is a Comparison list.

In the Comparison class, I keep the ids of the compared active principles and their proximity, for example

new Comparacao(1, 2, 98.5);

Ah, and the variable boundary within For’s indicate the size of the list.

int limite = listao.size()

As said earlier, I’m comparing all ids twice, for example 0 - 0 / 0 - 1 / 0 - 2 / 1 - 0 / 1 - 1 / 1 - 2 / 2 - 0 / 2 - 1 / 2 - 2

searchForIt looks in the Comp list for the record 0 - 1 before entering 1 - 0, you understand?

I was trying to use the Java 8 Stream Parallels but I don’t know how to use the right tool and I couldn’t assemble this logic in the Filters. (I don’t even know if that’s where I should set the terms.)

My question is if anyone has ever been through something like this and has some non-quadratic algorithm to do it and if they could point me in a direction.

  • I didn’t get that searchForItem nor this class Comparacao. The type of listao is List<String>? What is listaoComp?

  • It would not be simpler to create a class that represents its object, to implement the methods in it hashcode and equals with its definition of "equal" and then store them in a Set?

  • How many elements do you want to have in listaoComp as a result? The way you describe your problem, it’s unclear if it’s 21,000, if it’s 441,000,000, if it’s 210,000,000, or if it’s something else.

  • After understanding your problem, the solution should be simple, and unless you really need a quadratic number of results in your listaoComp, there must be a simple solution O(n log n) or O(n), probably using a Set or a Map. The problem is that it is difficult to understand what it is you are trying to do exactly. Put in your question the code of searchForItem and of Comparacao that should get easier. And describe the type of listao (given that .id, then it’s not String, but then I don’t know what it is).

  • I highlighted in the text that I wrote what searchForItem does. The Comparison class stores the ids of the objects compared and the proximity value between them. In this case the class has three attributes: id1, id2 and name. These ids refer to the positions of the list of objects I am checking. The list holds an object that will be compared. I can not say the exact number of elements but 441 million would be all possible comparisons, I believe the correct number would be around 50~52%.

  • OK I will change the question and await feedback.

Show 1 more comment

1 answer

0


Not intended to be the most performative solution to your case. But if you don’t want to compare the same items (1 : 1) or their inverses (1 : 0 and 0 : 1) you could simply change your internal loop to get in position i + 1

for(int i = 0; i < lista.size(); i++) {
    for(int j = (i + 1) ; j < lista.size(); j++) {
      // Como aqui você já ignora os indices iguais ou inversos..
      // seria simplesmente adicionar as comparações.
      listaComp.add(new Comparacao(...));
    }
}

This way you will compare the indices.. [0.1; 0.2; ...1.2; 1.3... 2.3 ...n, n+1...]

Analyzing an example of 10 items, a quadratic execution time algorithm (n²) like yours would have 100 iterations. Already, with this simple change, it would fall to 45 iterations. So, for your case of 21,000 records, of 441,000,000, it goes to 198.450.000 220,489,500 iterations.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.