Searching for names inside a text

Asked

Viewed 180 times

1

I have a string where I need to search by name. The text has no pattern. The name list is an array. So far I have managed to do normally, but I have come across a situation where a given name had a variant and the same was not found - because I am doing a literal search. Example:

$nomes = array("João da Silva","Antônio de Souza Santos","Mário Faria de Oliveira");

Like I said, my search is working. But if there is a case where the name "Antônio de Souza Santos" appears in the text as "Antônio de Souza" or "Antônio Santos", I cannot find.

I tried to apply this solution: Filter word in text with php, but I was not very successful. Someone has suggestions to resolve this?

An example of the text I need to search:

Date of Publication..: 02/07/2015 1st CIVIL COURT Expediente 30/06/2015 JUDGE(A) HOLDER: Nelson Marques da Silva JUDGE(A) SUBSTITUTE(A): Adriani Freire Diniz Garcia Denise Lucio Tavela Paulo Cássio Moreira JUDGE(A) ON DUTY: Flávio Branquinho da Costa Dias João Batista Mendes Filho Marcos Irany Rodrigues da Conceição ESCRIVÃO(Ã) : Alan Menezes Sidney COMMON PROCEDURE 00119 - 0055452.27.2011.8.13.0016 Author: Carlos Roberto Bertholucci; Defendant: Banco Bradesco Financiamento S.A. => Vista ao réu. Deadline of 0015 day(s). It is the defendant summoned to collect the amount of R $ 290,03, as costs, Judicial Fee, criminal fine and other procedural expenses due to the State, within 15 (fifteen) days, under penalty of registration of debt, plus a fine of 10% (ten percent)in debt and registration in the Informative Register of Delinquency in relation to the Public Administration of the State of Minas Gerais CADIN-MG and the extrajudicial protest of the Active Debt Certificate, by the Attorney General of the State AGE. Adv - Clovis Roberto Czegelski, Graciela Camargo Teixeira Rios, Matheus Siqueira de Alvarenga, Marta Aparecida de Castro Martiniano, Carlos Roberto de Carvalho Junior, Luciana Pereira, Francine Lopes Carvalho, Sebiana Vitale Cruz, Thaisse Christiane Schreier, Guilherme Octavio Santos Rodrigues, Marina Guimaraes Ribeiro, Fabiano Toledo Reis Souza, Leonardo Alves Bechara.

Observing: I’m using PHP.

  • Are you not using a database? If you are, it is simpler to do this direct search in the database.

  • No. This text comes from a file.

  • When someone looks for Antônio de Souza Santos you want to find too Antônio de Souza, or it would be the opposite?

  • It can be both cases. The name comes from the database. In the text it should look for possible variations.

  • I will explain, if necessary, drawing: the "text" comes from a file. The names come from the database. I need to find the names (that comes from the database) in the text (that comes from the file). I hope it was clear. If you need, I explain better.

  • I think I could use strpos to find the first name (Antony), If you find it, look for the combinations (Souza ou Silva). I don’t know if ER would serve this case.

  • I’ve tried both strpos and ER. And it didn’t work. Either you don’t answer me, or I did it wrong. You’d have a more practical suggestion of these two?

  • @Danilomiguel, ready, see if it is what I posted.

  • @Felipedouradinho apparently this is exactly what I need. I will make a test with my system and return with the result.

  • @Danilomiguel, fine! If it’s the right answer, please tick it!

  • @Lucky your script is excellent! I just can’t adapt to my reality. In fact, the text I reported occurs several times (I loop the contents of the source file) and in each loop I search the names. I even found a solution, but it didn’t work out because there might be repetition of names in the same text. The problem, in this case, is not I know script, but my need. I will study the case better and check which is the best solution.

  • But you need to search every loop? Why not concatenate into a final string and do the search?

  • @Felipedouradinho [continuing] Until because, after locating the name, I need to save this text (where the name occurs) to send to a report after going through all the content. Anyway, I repeat, your script is very valid. I will mark your reply by content efficiency. Thank you!

Show 8 more comments

1 answer

1


I made a feature that goes through the list of names and gives a preg_match in the name, where $nome has its spaces replaced by pattern (.*?).

procura_nome.php

<?php

    /**
     * Pesquisa por nomes em um texto, recebendo $nomes como array
     *
     * @param   array   $nomes
     * @param   string  $texto
     * @return  array
     */
    function pesquisaNomes($nomes, $texto)
    {

        $todos_resultados = [];

        if(is_array($nomes) && !empty($nomes))
        {
            foreach ($nomes as $key => $nome)
            {
                $resultado = [];
                preg_match("/".str_replace(" ", '(.*?)', $nome)."/i", $texto, $resultado);
                $resultado = array_filter(array_map('trim', $resultado));

                if(!empty($resultado))
                {
                    $todos_resultados[$nome] = $resultado[0];
                }
            }

        }
        return $todos_resultados;
    }


    $nomes = array (
        'João da Silva',
        'Antônio de Souza Santos',
        'Antônio de Souza',
        'Antônio Santos',
        'Mário Faria de Oliveira',
        'Nelson Marques',
        'Nelson da Silva',
    );

    $texto = 'Data de Publicação..: 02/07/2015 1ª VARA CÍVEL Expediente de 30/06/2015. Procurar por Antônio de Souza Santos. JUIZ(A) TITULAR: Nelson Marques da Silva JUIZ(A) SUBSTITUTO(A): Adriani Freire Diniz Garcia Denise Lucio Tavela Paulo Cássio Moreira JUIZ(A) PLANTONISTA: Flávio Branquinho da Costa Dias João Batista Mendes Filho Marcos Irany Rodrigues da Conceição ESCRIVÃO(Ã) : Alan Menezes Sidney PROCEDIMENTO ORDINÁRIO 00119 - 0055452.27.2011.8.13.0016 Autor: Carlos Roberto Bertholucci; Réu: Banco Bradesco Financiamentos S.A. => Vista ao réu. Prazo de 0015 dia(s) . Fica a parte ré intimada para o recolhimento da importância de R$ 290,03, a título de custas, de Taxa Judiciária, de multa penal e de outras despesas processuais devidas ao Estado, no prazo de 15 (quinze) dias, sob pena de inscrição do débito, acrescida de multa de 10% (dez por cento), em dívida ativa e de registro no Cadastro Informativo de Inadimplência em relação à Administração Pública do Estado de Minas Gerais CADIN-MG e do protesto extrajudicial da Certidão de Dívida Ativa, pela Advocacia-Geral do Estado AGE. Adv - Clovis Roberto Czegelski, Graciela Camargo Teixeira Rios, Matheus Siqueira de Alvarenga, Marta Aparecida de Castro Martiniano, Carlos Roberto de Carvalho Junior, Luciana Pereira, Francine Lopes Carvalho, Sebiana Vitale Cruz, Thaisse Christiane Schreier, Guilherme Octavio Santos Rodrigues, Marina Guimaraes Ribeiro, Fabiano Toledo Reis Souza, Leonardo Alves Bechara';

    var_export(pesquisaNomes($nomes, $texto));

Upshot

array (
  'Antônio de Souza Santos' => 'Antônio de Souza Santos',
  'Antônio de Souza' => 'Antônio de Souza',
  'Antônio Santos' => 'Antônio de Souza Santos',
  'Nelson Marques' => 'Nelson Marques',
  'Nelson da Silva' => 'Nelson Marques da Silva',
)
  • The idea is interesting but has problems: "Carlos Pereira", would find "Carlos Roberto de Carvalho Junior, Luciana Pereira". Proposed at least excuil bruising and the like: '([^.,;!]*?)'

Browser other questions tagged

You are not signed in. Login or sign up in order to post.