Identifying common snippets in two PHP strings

Question

Identifying common snippets in two PHP strings

Asked 10 years, 10 months ago

Viewed 2,028 times

7

I need to compare non-standard strings in PHP. I have 2 strings as below:

$primeira = 'asdasdasdTESTEasdasdasdasd';

$segunda = 'lkijlikjTESTEilkjik';

How do I dynamically know if the first and second variables contain the same sequence of equal characters? In this case exemplified by the string "TEST".

Maybe that question soen’s help.

– rray

2014/09/26 at 11:25

4 answers

7

I created a function that compares segments of strings and returns the same words in an array:

function palavras_iguais($string1, $string2, $minlen = 5) {
    $strlen1 = strlen($string1);
    $strlen2 = strlen($string2);
    $palavras = array();
    for($i=0; $i < $strlen1; $i++) {
        $palavra = substr($string1, $i, $minlen);
        if (strpos($string2, $palavra) !== false) {
            $j = $minlen;
            do {
                $j++;
            } while (strpos($string2, substr($string1, $i, $j)) !== false && $j < $strlen2);
            $palavra = substr($string1, $i, $j-1);
            $i += strlen($palavra)-1;
            $palavras[] = $palavra;
        }
    }
    return $palavras;   
}

Test 1:

$primeira = 'asdasdasdTESTEasdasdasdasd';
$segunda = 'lkijlikjTESTEilkjik';

print_r( palavras_iguais($primeira, $segunda) );

// Retorno:

Array
(
    [0] => TESTE
)

Test 2:

$primeira = 'asdFINALasdasdTESTEaTESTE2sdasdasdasdTESTENOFINAL';
$segunda = 'lkiTESTE2jlikjTESTEilkjTESikTESTENOFINALjhfdgkFINAL';

print_r( palavras_iguais($primeira, $segunda) );

// Retorno:

Array
(
    [0] => FINAL
    [1] => TESTE
    [2] => TESTE2
    [3] => TESTENOFINAL
)

Test 3:

$primeira = 'asdaTSCsdasdTESTEasdasdasdasd';
$segunda = 'lkijlikjTESTEilkjTSCik';

print_r( palavras_iguais($primeira, $segunda, 3) );

// Retorno:

Array
(
    [0] => TSC
    [1] => TESTE
)

Browser other questions tagged php string

You are not signed in. Login or sign up in order to post.

by bfavaretto • **64,705** points · Answer 1 · 2014-09-27T23:32:11+00:00

I thought about an approach a little different from the others. I wanted to avoid nested loops, but I didn’t test if this has a positive impact on performance. It works that way:

Creates an array of character groups from the first string. For example, with $minlen=2, the string "abcde" is divided into ["ab", "bc", "cd", "de"].
Checks whether each pair occurs in the second string. If it occurs then consider a single word (for example, if the second string contains "abc", the first two pairs are found in sequence).

I think it’s easier to understand in code form:

function matchingSubstrings($str1, $str2, $minlen=2) {
    $grupos = [];
    for($i=1; $i<strlen($str1); $i++) {
        array_push($grupos, substr($str1, $i-1, $minlen));
    }

    $palavras = [];
    $temp = '';
    $i = 0;
    $j = 0;

    do {
        if($k = strpos($str2, $grupos[$i], $j) !== false) {
            $j += $k;
            $temp .= $temp === '' ? $grupos[$i] : substr($grupos[$i], -1);
        } else {
            if($temp !== '') array_push($palavras, $temp); 
            $temp = '';
            $j = 0;
        }
        $i++;
    } while($i<count($grupos));

    return $palavras;
}

A test with repetitions:

matchingSubstrings('nnnabcnnnabcnnn', 'kkkabckkkabc');

Return:

Array
(
    [0] => abc
    [1] => abc
)

If the repetition is not desired at the return, just change the last line of the function by return array_unique($palavras);.

This function also worked with the tests of jader’s answer (the exit was identical).

Demo no ideone

by gmsantos • **17,221** points · Answer 2 · 2014-09-25T18:43:02+00:00

I don’t think there is a native php function that does this.

I found a solution on google that solves what you need.

function longest_common_substring($words)
{
    $words = array_map('strtolower', array_map('trim', $words));
    $sort_by_strlen = create_function('$a, $b', 'if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;');
    usort($words, $sort_by_strlen);

    // We have to assume that each string has something in common with the first
    // string (post sort), we just need to figure out what the longest common
    // string is. If any string DOES NOT have something in common with the first
    // string, return false.
    $longest_common_substring = array();
    $shortest_string = str_split(array_shift($words));
    while (sizeof($shortest_string)) {
        array_unshift($longest_common_substring, '');
        foreach ($shortest_string as $ci => $char) {
            foreach ($words as $wi => $word) {
                if (!strstr($word, $longest_common_substring[0] . $char)) {
                    // No match
                    break 2;
                }
            }

            // we found the current char in each word, so add it to the first longest_common_substring element,
            // then start checking again using the next char as well
            $longest_common_substring[0].= $char;
        }

        // We've finished looping through the entire shortest_string.
        // Remove the first char and start all over. Do this until there are no more
        // chars to search on.
        array_shift($shortest_string);
    }

    // If we made it here then we've run through everything
    usort($longest_common_substring, $sort_by_strlen);
    return array_pop($longest_common_substring);
}

This solution returns the largest set of similar characters among an array of strings.

The implementation is very simple:

$primeira = 'asdasdasdTEStEasdasdasdasd';
$segunda = 'lkijlikjTESTEilkjik';

echo longest_common_substring([$primeira, $segunda]);

by Vitor Vezani • **491** points · Answer 3 · 2014-09-25T18:45:31+00:00

2

You can use the function strpos.For more details click here

if (strpos($primeira, $segunda) !== false)
    echo 'true';

1

Reading the question better I don’t think it solves the problem.

– gmsantos

2014/09/25 at 18:48
1

This function does not meet my need because I need to check if there is in the two variables an iguai part, represented by "TEST", independent of the characters coming before and after the "TEST".

– Gustavo Piucco

2014/09/25 at 18:49
1

This is not a criticism of the answer, but rather of those who voted without reading the question: I am surprised that two people voted in favour, since it does not go anywhere close to answering what was asked.

– Bacco

2014/09/27 at 20:27