7
Is there any method to know the percentage of equality of two texts (Strings) in Java?
An abstract algorithm would also serve.
7
Is there any method to know the percentage of equality of two texts (Strings) in Java?
An abstract algorithm would also serve.
9
The Levenshtein Distance is one of the best known algorithms for this:
public class Levenshtein {
public static int distance(String a, String b) {
a = a.toLowerCase();
b = b.toLowerCase();
// i == 0
int [] costs = new int [b.length() + 1];
for (int j = 0; j < costs.length; j++)
costs[j] = j;
for (int i = 1; i <= a.length(); i++) {
// j == 0; nw = lev(i - 1, j)
costs[0] = i;
int nw = i - 1;
for (int j = 1; j <= b.length(); j++) {
int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
nw = costs[j];
costs[j] = cj;
}
}
return costs[b.length()];
}
public static void main(String [] args) {
String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
for (int i = 0; i < data.length; i += 2)
System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
}
}
Exit:
distance(kitten, sitting) = 3
distance(saturday, sunday) = 3
distance(rosettacode, raisethysword) = 8
If you want the difference in percentage, divide by the string size and multiply by 100.
Source:
http://rosettacode.org/wiki/Levenshtein_distance#Java
function levenshtein(str1, str2) {
var m = str1.length,
n = str2.length,
d = [],
i, j;
if (!m) return n;
if (!n) return m;
for (i = 0; i <= m; i++) d[i] = [i];
for (j = 0; j <= n; j++) d[0][j] = j;
for (j = 1; j <= n; j++) {
for (i = 1; i <= m; i++) {
if (str1[i-1] == str2[j-1]) d[i][j] = d[i - 1][j - 1];
else d[i][j] = Math.min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1;
}
}
return d[m][n];
}
function calc() {
var t1 = document.getElementById('t1');
var t2 = document.getElementById('t2');
var r1 = document.getElementById('r1');
var d = levenshtein( t1.value, t2.value );
r1.innerHTML = d;
r2.innerHTML = ( 100 - Math.floor( 100 * d / Math.max( t1.value.length, t2.value.length))) + '%';
}
Palavra 1:<br>
<input id="t1" type="text" onKeyUp="calc()"><br>
Palavra 2:<br>
<input id="t2" type="text" onKeyUp="calc()"><br>
Distancia:<br>
<div id="r1">?</div>
Similaridade:<br>
<div id="r2">?</div>
Browser other questions tagged java algorithm
You are not signed in. Login or sign up in order to post.
thank you. Just to get rich, I found this http://en.wikipedia.org/wiki/Category:String_similarity_measures.
– Cold
@Cold is a good reference. Elaborate an answer in this sense, I think it would be an interesting alternative.
– Bacco
I can’t do it now @Bacco (it would be great to enrich your answers with other references, in my point of view :D). Just so we’re clear, put him on point and show us the percentage :).
– Cold
@Cold put the % calculation in the JS version as reference :)
– Bacco
I was still here killing myself in the logic of getting the bedbug, how/why did you do it like this?
– Cold
First, I divided the distance obtained with the formula by the size of the largest string (max), because it is based on characters. Then I multiplied it by 100 to get the percentage difference. As we wanted the similarity, I took the difference obtained of 100% (so the 100 - (...). The Math.floor was only for output not to be "ugly", with broken type 33.3333333 etc.
– Bacco
Let’s go continue this discussão in chat.
– Cold