Text comparison

Asked

Viewed 89 times

2

I have a question when comparing variables.

I receive a variable value in a string and I need to compare it to another string.

For example:

$var1 = "M. D. AQUI";
$var2 = "MD AQUI"; // COM PONTUAÇÃO OU SEM PONTUAÇÃO. COM ESPAÇOS OU SEM ESPAÇOS.

Well, I tried to make one replace in the variable, exchanging the points for nothing but, the space continues. I can’t take the space because the text will be all together.

$result = str_replace(". ", "", $var1); // resultado: MDAQUI  / Com isso não consigo fazer a comparação de semelhanças.

Could someone help with the code or indicate a study tool?

  • In the str_replace(". ", "", $var1); remove the space and leave only the point, so: str_replace(".", "", $var1);.

  • I’ve done it that way, but there’s still a problem. For example: If I take the points from M. D. HERE the result will be M D HERE if compare to MD HERE the result is false;

  • Knife is different from Do, But if I take out the cedilla, they look the same. What’s the real logic of changing the word? What kind of comparison do you need to make? Do you consider the accents?

2 answers

1

What you have to do is remove all spaces or points from TWO strings:

$var1 = str_replace(".", "", $var1);
$var1 = str_replace(" ", "", $var1);
$var2 = str_replace(".", "", $var2);
$var2 = str_replace(" ", "", $var2);

$var1==$var2  (true)

If you want to compare similarities like you said in the code you can use the function similar_text:

$var1 = strtoupper("M. D. AQUI");
$var2 = strtoupper("MD AQUI");

similar_text($var1, $var2, $percentagemDeSemelhanca);
echo $percentagemDeSemelhanca;

//resultado => 82.3529411765

Then you will know the percentage of similarity of the two strings. I used the strtoupper to increase the probability of similarity between strings in case they are not capitalized.

Phpfiddle example

  • 1

    Thank you very much my friend, you helped a lot. I will use the similar_text function. Thank you.

  • 1

    If this or another answer answered your question do not forget to accept the answer. See How and why to accept answers

  • 1

    @Jorgeb., strtoupper would have trouble with accents, I suggest MB* to avoid problem between comparison.

0

A slightly different approach that allows depend on of similar_text(), providing its use, by removing points and spaces regularly and conditionally.

For this approach, the ideal would be to use preg_replace_callback() but with two preg_replace() consecutive ER is cleaner:

$var1 = "M. D. AQUI";
$var2 = "MD AQUI";

$var1 = preg_replace( '/(\w)\.\s+(?!\w{2,})/', '$1', $var1 ); // MD. AQUI

$var1 = preg_replace( '/(\w)\.\s+(?=\w{2,})/', '$1 ', $var1 ); // MD AQUI

if( $var1 != $var2 ) {

    similar_text( $var1, $var2, $percentual );

    if( $percentual > 70 ) {

        // Strings similares, faz alguma coisa
    }

} else {

    // Strings iguais
}

The first substitution removes points and spaces from individual letters if they are not followed by a word with more than one letter.

The second does the same thing, but the other way around. If the letter and dot are followed by a larger word, remove the dot, but add an extra space.

So it doesn’t get "all stuck together".

This approach has the advantages:

  • Handle only one of the strings, which is useful if the second comes from a fixed source that you cannot or should not change
  • Does not require the use of similar text() because, at least in the above scenario, the strings are equal. If they are not and you want to rely on similar_text() as fallback, it decreases the probability of the percentage returning a false-positive with a score very low.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.