Check words that appear in foreach

Asked

Viewed 86 times

1

I have the following code:

    <?PHP
$texto1 = file_get_contents('cot.txt');
//adiciona o texto em posições do array
preg_match_all('|texto.\d+(.+?)<\/body>|is', $texto1, $resultado);
$textos = $resultado[1];
    $arrayCot =  explode(" ", file_get_contents('PalavrasCot.txt'));
    $arra = $textos[$n];
            foreach($arrayCot as $valor){
                if (strpos($arra, $valor) !== false) {
                    $contCot++;
                    $ArzCot[$n] = $contCot;
                }
            }
    ?>

In this code, it reads a text file, separates the content and checks whether the words exist in a separate file (in Cód. $arrayCot). My question is: How to show, how many times each word in $arrayCot appeared. Text in PalavrasCot.txt:

parque. parque, parque brincadeiras. brincadeiras, brincadeiras mães mães, mães. filho, filho. filho acidente. acidente, acidente venda venda, venda. família família natureza, natureza. natureza carro. carro, carro crianças, crianças. crianças escola, escola. escola

1 answer

2


Let’s assume that the original text is in $texto and the listing of words in $palavras, just to simplify reading.

A relatively simple algorithm is this:

$aTexto = explode( ' ', $texto );
$aPalavras = explode( ' ', $palavras );
$contagem = array();

foreach( $aTexto as $pTexto ) {
    if( in_array( $pTexto, $aPalavras ) ) {
        $contagem[$pTexto] = isset( $contagem[$pTexto] ) ? $contagem[$pTexto] + 1 : 1;
    }
}

See working on IDEONE.

To locate the words in array $aPalavras, we use the in_array:

http://php.net/manual/en/function.in-array.php


Considerations

It escapes the question a bit, but it is important to note a few things. The code would need, for use in real situations, some improvements.

  • There is no treatment of spaces and line breaks. Probably before the explode would help normalize double spaces, tabulations and line breaks for single spaces.

  • your word list depends on repetition with commas and dots, which creates two problems: one is that the count separates into each group. Probably in a real situation, the list would count only words, and the algorithm would take out the semicolon (and any more characters that need to be removed) before searching. Would this change suffice:

    if( in_array( rtrim( $pTexto, '.,;!?' ), $aPalavras ) ) {
    

    Thus, you eliminate the need to have the words repeated in the listing.

  • uppercase and lowercase do not work in your original proposal. The solution would be, for example, to register all the words in lowercase in the search dictionary, and use this function to normalize in the text:

    if( in_array( mb_strtolower( $pTexto ) ), $aPalavras ) ) {
    

    Note that in this case, the charset PHP needs to be configured correctly for the file format, otherwise you will have problems with accentuation.

  • finally, in an actual application you would normally not load the whole text into memory as you do today. You could simply read the text in blocks, and as you find spaces, count it already. This way, it does not get duplicate data in memory (keeping the array and the original text unnecessarily until a result is obtained).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.