Comparing similarity in two arrays!

Asked

Viewed 62 times

1

Good night guys, I’m in trouble, come on, maybe somebody can help me.

$array1 = array();
$array2 = array();
foreach ($paper as $p)
{
    $array1[] = $p->title;
    $array2[] = $p->title;
}

$nbarray1 = count($array1);
$stringSimilarity = 0;

foreach ($array1 as $word1)
{
    $max = null;
    $similarity = null;
    foreach ($array2 as $word2)
    {
        similar_text($word1, $word2, $similarity);
        if ($similarity > $max)
        {
            $max = $similarity;
        }
    }
    $stringSimilarity += $max;
    $resultado = $stringSimilarity / $nbarray1;

    if ($resultado > 90)
    {
        echo '<b>Título 1:</b> ' . $word1 . ' <br><b>Título 2:</b> ' . $word2 . ' <b><br>Resultado: POSSIVELMENTE DUPLICADO - Porcentagem = ' . number_format((float)$resultado, 0, '.', '') . '%<br></b>';
    }
    else
    {
        echo '<b>Título 1:</b> ' . $word1 . ' <br><b>Título 2:</b> ' . $word2 . ' <b><br>Resultado: NÃO DUPLICADO - Porcentagem = ' . number_format((float)$resultado, 0, '.', '') . '%<br></b>';
    }

}

This code has the following OUTPUT

Título 1: A new method for SSD black-box performance test 
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications 
Resultado: NÃO DUPLICADO - Porcentagem = 25%
Título 1: Structural Health Monitoring of a rotor blade during statical load test 
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications 
Resultado: NÃO DUPLICADO - Porcentagem = 50%
Título 1: Using TTCN-3 in Performance Test for Service Application 
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications 
Resultado: NÃO DUPLICADO - Porcentagem = 75%
Título 1: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications 
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications 
Resultado: POSSIVELMENTE DUPLICADO - Porcentagem = 100%
  1. Note that I only have 4 registered titles. How I would make sure that the title does not try to test its similarity to itself???
  2. Note that only one title tested with all others, the right one should restart the loop with another title testing again, and so on until all are tested with all.
  3. $paper is an array of this type

    [0]=> Object(stdClass)#97 (26) { ["paper_id"]=> string(1) "1" ["title"]=> string(47) "A new method for SSD black-box performance test" ["Author"]=> string(6) "Q. Xie"

  4. Would it be possible after I checked the duplicates, to update the paper object array? In case it has ["status"]=> and I wanted to update this to duplicate if it was found in previous validations.

If someone with a lot of patience helps to think of the logic I’m already happy, I’m getting beat up but I’m trying to develop :D

1 answer

1

That would prevent him from checking the duplicate by placing a if() within the second foreach():

foreach($array2 as $word2){
//verifica se word1 é diferente de word2, se for igual ele não compara
    if($word1 != $word2){ 
        similar_text($word1, $word2, $similarity);
        if($similarity > $max){ //1)
            $max = $similarity;
        }
    }
}

You can define this beginning in other ways that are less repetitive, for example:

$array1 = array();
$array2 = array();
foreach ($paper as $p) {
   $array1[] = $p->title;
   $array2[] = $p->title;
}

Or just like this:

foreach ($paper as $p) {
   $array1[] = $p->title;
   $array2[] = $p->title;
}
  • I’ll edit my question to be clearer friend Wees

  • @Gutog you want to change the array if this function uses the duplicate name??

  • Yes, in addition to also making those changes I mentioned, how to test all with all and prevent the title test with itself

  • only, you are testing two equal arrays, certainly q will have equal titles, and finally, you will delete all of them from the array

Browser other questions tagged

You are not signed in. Login or sign up in order to post.