Percentage of Equality of Texts?

Asked

Viewed 402 times

7

Is there any method to know the percentage of equality of two texts (Strings) in Java?

An abstract algorithm would also serve.

1 answer

9


The Levenshtein Distance is one of the best known algorithms for this:

Java:

public class Levenshtein {

    public static int distance(String a, String b) {
        a = a.toLowerCase();
        b = b.toLowerCase();
        // i == 0
        int [] costs = new int [b.length() + 1];
        for (int j = 0; j < costs.length; j++)
            costs[j] = j;
        for (int i = 1; i <= a.length(); i++) {
            // j == 0; nw = lev(i - 1, j)
            costs[0] = i;
            int nw = i - 1;
            for (int j = 1; j <= b.length(); j++) {
                int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
                nw = costs[j];
                costs[j] = cj;
            }
        }
        return costs[b.length()];
    }

    public static void main(String [] args) {
        String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
        for (int i = 0; i < data.length; i += 2)
            System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
    }
}

Exit:

distance(kitten, sitting) = 3
distance(saturday, sunday) = 3
distance(rosettacode, raisethysword) = 8

If you want the difference in percentage, divide by the string size and multiply by 100.

Source:

http://rosettacode.org/wiki/Levenshtein_distance#Java


Javascript demo:

function levenshtein(str1, str2) {
  var m = str1.length,
      n = str2.length,
      d = [],
      i, j;

  if (!m) return n;
  if (!n) return m;

  for (i = 0; i <= m; i++) d[i] = [i];
  for (j = 0; j <= n; j++) d[0][j] = j;

  for (j = 1; j <= n; j++) {
    for (i = 1; i <= m; i++) {
      if (str1[i-1] == str2[j-1]) d[i][j] = d[i - 1][j - 1];
      else d[i][j] = Math.min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1;
    }
  }
  return d[m][n];
}


function calc() {
  var t1 = document.getElementById('t1');
  var t2 = document.getElementById('t2');
  var r1 = document.getElementById('r1');
  var d  = levenshtein( t1.value, t2.value );
  r1.innerHTML = d;
  r2.innerHTML = ( 100 - Math.floor( 100 * d / Math.max( t1.value.length, t2.value.length))) + '%';
}
Palavra 1:<br>
<input id="t1" type="text" onKeyUp="calc()"><br>
Palavra 2:<br>
<input id="t2" type="text" onKeyUp="calc()"><br>
Distancia:<br>
<div id="r1">?</div>
Similaridade:<br>
<div id="r2">?</div>

  • thank you. Just to get rich, I found this http://en.wikipedia.org/wiki/Category:String_similarity_measures.

  • @Cold is a good reference. Elaborate an answer in this sense, I think it would be an interesting alternative.

  • 1

    I can’t do it now @Bacco (it would be great to enrich your answers with other references, in my point of view :D). Just so we’re clear, put him on point and show us the percentage :).

  • @Cold put the % calculation in the JS version as reference :)

  • I was still here killing myself in the logic of getting the bedbug, how/why did you do it like this?

  • 1

    First, I divided the distance obtained with the formula by the size of the largest string (max), because it is based on characters. Then I multiplied it by 100 to get the percentage difference. As we wanted the similarity, I took the difference obtained of 100% (so the 100 - (...). The Math.floor was only for output not to be "ugly", with broken type 33.3333333 etc.

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.