How to know the most used word in all records

Asked

Viewed 156 times

4

I’m doing a top trendig system, but I’m very confused: How do I select to pull only the 10 most commonly used words on all records.

However, the words are in a column and separated by a comma.

NOTE: I am using PHP 5.3

Tabela posts
Estrutura: 
id
post_id
post
author_id
author
hashtags
date
  • Put in question the table structure and examples of data that would be stored. Have you tried anything? At least describe how you think it can be done.

  • Is there any possibility of you playing these words in an auxiliary table, or working with them in json?

  • It is not clear what the structure is like, making an objective answer impossible. For now, see if this can be useful: https://stackoverflow.com/questions/748276

  • Gabriel, in this case the column is the hashtags? Have you ever considered creating a many-to-many relationship between the table posts and a table hashtags? Probably the solution would be more semantic and it would be possible (I believe) to solve the problem only with SQL.

  • @Andersoncarloswoss However, I already have a system that separates the tags from within the text of post and add in column hashtags, when the post is inserted in the posts table.

3 answers

6

You need a dictionary.

Interestingly, in PHP arrays can work as dictionaries. I discovered this now by searching to answer the question. So let’s get started:

$dicionario = array();

The logic is as follows: make a dictionary where the keys are the words, and the value is the number of occurrences.

Since each record has several words separated by commas, let’s start by extracting the words from each record. In most languages we call it dividing (split), but since PHP is a terrorist thing in it we use the function explode:

$palavras = explode(",", $registro);

And then we include the words in the dictionary as follows: if the key does not exist, it is created with zero value. Then, regardless of whether it exists or not, we increase its value.

foreach ($palavras as $chave) {
    if (!$dicionario[$chave]) {
        $dicionario[$chave] = 0;
    }
    $dicionario[$chave]++;
}

Note that we have to do the explosion and the addition to the dictionary once for each record.

Finally, we need to get the records at the top ten. The algorithm below removes the top ten values from the dictionary while neatly including them in another dictionary.

$dezMaiores = array();
for ($i = 1; $i <= 10; $i++) {
    $maiorValor = 0;
    $maiorChave = "";
    foreach ($dicionario as $chave => $valor) {
        if ($valor > $maiorValor) {
            $maiorValor = $valor;
            $maiorChave = $chave;
        }
    }
    $dezMaiores[$i] = $maiorChave;
    unset($dicionario[$maiorChave]);
}

Now you can use the ten most commonly used expressions on your system :)

3

Another solution running all the logic in PHP is to use the native functions array_merge and array_map to create the word list, the function array_count_values to calculate the frequency of each word and the function arsort to order them downwards.

<?php

$rows = [
    "abacate,banana,caqui",
    "banana,melão,goiaba",
    "laranja,banana,abacate"
];

$palavras = array_merge(
    ...array_map(
        function ($value) { 
            return explode(',', $value); 
        }, 

        $rows
    )
);

$palavras = array_count_values($palavras);

arsort($palavras);

print_r($palavras);

The above solution uses the splat Operator, so it only works in PHP 5.6 versions+.

The output produced is:

Array
(
    [banana] => 3
    [abacate] => 2
    [caqui] => 1
    [melão] => 1
    [goiaba] => 1
    [laranja] => 1
)
  • I need a solution that works in php 5.3.

  • 1

    @Gabrielhenrique then put this in the question.

0

$link = new mysqli ("localhost", "USUARIO", "SENHA", "NOME_BANCO");

$query = "SELECT Nome_Coluna FROM Nome_Tabela";

$results = mysqli_query($link,$query);

while($row = mysqli_fetch_assoc($results)) {

  $words .= $row["Nome_Coluna"].",";

}

$palavras = explode(',', $words);

//numero total de palavras
//echo count($palavras); 

$ocorrencias = array();

for($i = 0; $i<count($palavras); $i++){
    $palavra = $palavras[$i];

      // Evitar erros Undefined index vide OBS no final da resposta
      if (!isset($ocorrencias[$palavra])) {
          $ocorrencias[$palavra] = 0;
      }

    $ocorrencias[$palavra]++;
}

arsort($ocorrencias);

//mostra todo o array estruturado
//var_dump($ocorrencias);

    //mostra os dez primeiros valores e respectivas chaves
    $i=0;
     foreach($ocorrencias as $chave => $valor){
            if($i < 10){  
                echo $chave.' - '.$valor.'<br>';
            }
            $i=$i+1;
        }

Note: In the first word, $ocorrencias is an empty list and when you do $ocorrencias[$palavra] you will be accessing a non-existent key. To avoid the error, just check if the key exists and, if it does not exist, create it with zero.
Source

  • When I put db info to words do not appear.

  • @Gabrielhenrique in the connection?

  • It is not in connection. The most commonly used words are not appearing.

  • @Leocaracciolo Instead of leaving a link with the clarification, why don’t you write one with your words? -Review.

  • 1

    The answer is not given by link, part of the answer is that refers to a more detailed clarification, alias, this part was a question of my own and answered by Anderson Carlos Woss. It is normal that in the answers links are placed for more details in some parts of the answer..

  • 1

    If we think that some day the linked pages may no longer exist, then you should copy the content of these links Datetime, Datetime.Parse, Tostring, Writeline and Readline.dado in your reply and paste in this post https://en.stackoverflow.com/questions/223736/fun%C3%A7%C3%A3o-datetime-c/223740#223740

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.