Most common words among Rows

Asked

Viewed 249 times

4

Is there any function in Mysql that gives me the 3 most common words between a column in format TEXT one-table?

Example

Maria Joaquina
Maria Antonienta
Maria B.

among these Rows he returns Mary because she was a more used term.

4 answers

7

One way is to take advantage of the array and count the repeated items using array_count_values. You will have an output as follows:

Array
(
    [Maria] => 3
    [Joaquina] => 1
    [Antonienta] => 1
    [B] => 1
)

I worked with a string and used explode, but you can adapt to the mysql result. I made an example on Ideone for you to see.

$str = 'Maria Joaquina Maria Antonienta Maria B';
$str = explode( ' ' , $str );
$str = array_count_values( $str );
print_r( $str ); // retorna o output acima, com a contagem total
print_r( key( $str ) ); // retorna o primeiro índice com maior peso: `Maria`
  • 1

    Good example with array_count_values +1

  • 1

    @Guilhermenascimento. I had to use this function a short time ago and I remembered it now. It saves some loop lines.

5

There is no function ready to do this.

I know it can be done in SQL but it’s more complicated. As my familiarity with SQL is more limited, especially from Mysql and as you put the tag I’ll answer used it.

$resultado = mysqli_query("SELECT texto FROM tabela");
$contagem = array();
while ($linha = mysqli_fetch_array($resultado)) {
    $palavras = str_word_count($linha, 1);
    foreach($palavras AS $palavra) $contagem[$palavra]++;
}
arsort($contagem);
$i = 0;
foreach($contagem AS $key => $value) {
    echo $value . " => " . $key . "\n";
    $i++;
    if ($i >= 3) break;
}

I put in the Github for future reference.

5


First you need to separate the delimiter for lines, can be done through function or a solution found in the Soen.

SELECT 
  dados.id, 
  SUBSTRING_INDEX(SUBSTRING_INDEX(dados.descricao, ' ', quantidade.n), ' ', -1) as descricao
FROM 
    (SELECT 1 n UNION ALL SELECT 2
     UNION ALL SELECT 3 UNION ALL SELECT 4) as quantidade
INNER JOIN dados
     ON CHAR_LENGTH(dados.descricao)-CHAR_LENGTH(REPLACE(dados.descricao, ' ',''))>=quantidade.n-1

Sqlfiddle

After we have the data row by row just group and sort by the most used.

SELECT descricao, count(descricao) as quantidade
FROM ( ... )
GROUP BY descricao
ORDER BY quantidade DESC
LIMIT 1

Sqlfiddle

  • How can I put this Where CONCAT(',',hash.tag,',') LIKE '%,tag,%' in code? Type go in table table WHERE CONCAT(',',table.tag,',') LIKE '%,tag,%' and execute your code

  • Have you answered your question, just include the column tag in the FROM and add the WHERE.. http://sqlfiddle.com/#! 2/c28c09/1/0 @user3163662

4

Can be done in php with preg_split (can be used explode but this only supports one character at a time, and in the case of whitespaces the best is REGEX) and one vector, in case you can preferably create a function:

function buscarPalavraMaisRecorrente($data, $last=1) {
    $itens = array();
    
    $list = preg_split('/\s+/', strtolower($data));
    $j = count($list);

    for ($i = 0; $i < $j; ++$i) {
        $key = $list[$i];
        if (false === isset($itens[$key])) {
            $itens[$key] = 1;//Cria um item ex. maria
        } else {
            $itens[$key] += 1;//Soma
        }
    }

    $list = null;

    if (count($itens) === 0) {
        return array();
    }

    $j = max($itens);
    $j = max($j, $last);

    $found = array();

    for ($i = 0; $i < $j; ++$i) {
        $tmp = array_keys($itens, $j - $i);

        if (false !== $tmp) {
            $found = array_merge($found, $tmp);
        }
    }

    $found = array_slice($found, 0, $j);
    return $found;
}

Search the 3 most commonly used words:

$exemplo = 'Maria Joaquina Maria Antonienta Maria B.';
print_r(buscarPalavraMaisRecorrente($exemplo, 3));

Returns the most commonly used word:

$exemplo = 'Maria Joaquina Maria Antonienta Maria B.';
print_r(buscarPalavraMaisRecorrente($exemplo));

Using inside of your while:

while ($linha = $consulta->fetch(PDO::FETCH_ASSOC)) {
    echo 'Palavra que mais repete: ',
          implode(',', buscarPalavraMaisRecorrente($linha['coluna_text'], 3));
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.