Capture data by default until last space

Asked

Viewed 53 times

1

I am trying to make a Scrapping of a website, and would like to capture the value that follows the pattern below:

Advogado: XXXXX Número do Processo: XXXXXX OutroCampo: XXXXX

usually what separates this information is a space, so would be what would be captured is, for example, Lawyer: Bill Gates(here would have a space/tab)

Standard:

NOME_DO_CAMPO:(optional space)Valor a ser capturado(final space)

I started with this regex, but it only captures the beginning and not the value "between"

regex: \w+:\s{1}

2 answers

2


See if that’s what you need:

<?php

$string = 'Advogado: XXX XX Número do Processo: XX XXXX OutroCampo: XXX XX';

preg_match_all('/(Advogado\:)(.+?)(Número\sdo\sProcesso\:)(.+?)(OutroCampo\:)(.+?)$/', $string, $matches);


echo 'MATCHES: <br>';
echo 'Advogado: '.$matches[2][0].'<br>';
echo 'Processo: '.$matches[4][0].'<br>';
echo 'Outro campo: '.$matches[6][0].'<br>';

echo '<pre>';
print_r($matches);
echo '</pre>';

Exit:

MATCHES: 
Advogado: XXX XX 
Processo: XX XXXX 
Outro campo: XXX XX
Array
(
    [0] => Array
        (
            [0] => Advogado: XXX XX Número do Processo: XX XXXX OutroCampo: XXX XX
        )

    [1] => Array
        (
            [0] => Advogado:
        )

    [2] => Array
        (
            [0] =>  XXX XX 
        )

    [3] => Array
        (
            [0] => Número do Processo:
        )

    [4] => Array
        (
            [0] =>  XX XXXX 
        )

    [5] => Array
        (
            [0] => OutroCampo:
        )

    [6] => Array
        (
            [0] =>  XXX XX
        )

)

Example in Regex101.com

2

Having spaces like the field delimiter, and allowing spaces in the field values can become a problem. It’s unlikely, but if the lawyer’s name is "Case Number: XXXXXX Outrocamp: XXXXX" it’s hard to validate.

Anyway I imagine what you want is a simple '/^Advogado:\s?(.*)\s?Número do Processo/'. There’s no point validating a sequence of letters with \w+ since the first word will always be Lawyer.

Example:

$entrada = "Advogado: Bill Gates Número do Processo: XXXXXX OutroCampo: XXXXX";
preg_match('/^Advogado:\s?(.*)\s?Número do Processo/', $entrada, $match);

//toda a expressão 'Advogado: Bill Gates Número do Processo'
echo $match[0];

echo '<br>';

//apenas o match entre parenteses 'Bill Gates'
echo $match[1];
  • It is because the field (Lawyer) may vary, so I used the w+

  • and the "process number" as well

  • 1

    If the data may vary, I suggest you edit the question and put a few more cases. But I already say that, if both the value of the field (name and surnames) and what comes after ("Process number") are separated by spaces, it is difficult to know where one ends and another begins

Browser other questions tagged

You are not signed in. Login or sign up in order to post.