1
Good night guys, I’m in trouble, come on, maybe somebody can help me.
$array1 = array();
$array2 = array();
foreach ($paper as $p)
{
$array1[] = $p->title;
$array2[] = $p->title;
}
$nbarray1 = count($array1);
$stringSimilarity = 0;
foreach ($array1 as $word1)
{
$max = null;
$similarity = null;
foreach ($array2 as $word2)
{
similar_text($word1, $word2, $similarity);
if ($similarity > $max)
{
$max = $similarity;
}
}
$stringSimilarity += $max;
$resultado = $stringSimilarity / $nbarray1;
if ($resultado > 90)
{
echo '<b>Título 1:</b> ' . $word1 . ' <br><b>Título 2:</b> ' . $word2 . ' <b><br>Resultado: POSSIVELMENTE DUPLICADO - Porcentagem = ' . number_format((float)$resultado, 0, '.', '') . '%<br></b>';
}
else
{
echo '<b>Título 1:</b> ' . $word1 . ' <br><b>Título 2:</b> ' . $word2 . ' <b><br>Resultado: NÃO DUPLICADO - Porcentagem = ' . number_format((float)$resultado, 0, '.', '') . '%<br></b>';
}
}
This code has the following OUTPUT
Título 1: A new method for SSD black-box performance test
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications
Resultado: NÃO DUPLICADO - Porcentagem = 25%
Título 1: Structural Health Monitoring of a rotor blade during statical load test
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications
Resultado: NÃO DUPLICADO - Porcentagem = 50%
Título 1: Using TTCN-3 in Performance Test for Service Application
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications
Resultado: NÃO DUPLICADO - Porcentagem = 75%
Título 1: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications
Título 2: Novel Solution for the Built-in Gate Oxide Stress Test of LDMOS in Integrated Circuits for Automotive Applications
Resultado: POSSIVELMENTE DUPLICADO - Porcentagem = 100%
- Note that I only have 4 registered titles. How I would make sure that the title does not try to test its similarity to itself???
- Note that only one title tested with all others, the right one should restart the loop with another title testing again, and so on until all are tested with all.
$paper is an array of this type
[0]=> Object(stdClass)#97 (26) { ["paper_id"]=> string(1) "1" ["title"]=> string(47) "A new method for SSD black-box performance test" ["Author"]=> string(6) "Q. Xie"
Would it be possible after I checked the duplicates, to update the paper object array? In case it has ["status"]=> and I wanted to update this to duplicate if it was found in previous validations.
If someone with a lot of patience helps to think of the logic I’m already happy, I’m getting beat up but I’m trying to develop :D
I’ll edit my question to be clearer friend Wees
– Guto G
@Gutog you want to change the array if this function uses the duplicate name??
– Wees Smith
Yes, in addition to also making those changes I mentioned, how to test all with all and prevent the title test with itself
– Guto G
only, you are testing two equal arrays, certainly q will have equal titles, and finally, you will delete all of them from the array
– Wees Smith