Capture data by default until last space

Question

Capture data by default until last space

Asked 6 years, 7 months ago

Viewed 53 times

1

I am trying to make a Scrapping of a website, and would like to capture the value that follows the pattern below:

Advogado: XXXXX Número do Processo: XXXXXX OutroCampo: XXXXX

usually what separates this information is a space, so would be what would be captured is, for example, Lawyer: Bill Gates(here would have a space/tab)

Standard:

NOME_DO_CAMPO:(optional space)Valor a ser capturado(final space)

I started with this regex, but it only captures the beginning and not the value "between"

regex: \w+:\s{1}

2 answers

2

See if that’s what you need:

<?php

$string = 'Advogado: XXX XX Número do Processo: XX XXXX OutroCampo: XXX XX';

preg_match_all('/(Advogado\:)(.+?)(Número\sdo\sProcesso\:)(.+?)(OutroCampo\:)(.+?)$/', $string, $matches);


echo 'MATCHES: <br>';
echo 'Advogado: '.$matches[2][0].'<br>';
echo 'Processo: '.$matches[4][0].'<br>';
echo 'Outro campo: '.$matches[6][0].'<br>';

echo '<pre>';
print_r($matches);
echo '</pre>';

Exit:

MATCHES: 
Advogado: XXX XX 
Processo: XX XXXX 
Outro campo: XXX XX
Array
(
    [0] => Array
        (
            [0] => Advogado: XXX XX Número do Processo: XX XXXX OutroCampo: XXX XX
        )

    [1] => Array
        (
            [0] => Advogado:
        )

    [2] => Array
        (
            [0] =>  XXX XX 
        )

    [3] => Array
        (
            [0] => Número do Processo:
        )

    [4] => Array
        (
            [0] =>  XX XXXX 
        )

    [5] => Array
        (
            [0] => OutroCampo:
        )

    [6] => Array
        (
            [0] =>  XXX XX
        )

)

Example in Regex101.com

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by user135358 • 21 points · Answer 1 · 2019-01-04T00:43:28+00:00

Having spaces like the field delimiter, and allowing spaces in the field values can become a problem. It’s unlikely, but if the lawyer’s name is "Case Number: XXXXXX Outrocamp: XXXXX" it’s hard to validate.

Anyway I imagine what you want is a simple '/^Advogado:\s?(.*)\s?Número do Processo/'. There’s no point validating a sequence of letters with \w+ since the first word will always be Lawyer.

Example:

$entrada = "Advogado: Bill Gates Número do Processo: XXXXXX OutroCampo: XXXXX";
preg_match('/^Advogado:\s?(.*)\s?Número do Processo/', $entrada, $match);

//toda a expressão 'Advogado: Bill Gates Número do Processo'
echo $match[0];

echo '<br>';

//apenas o match entre parenteses 'Bill Gates'
echo $match[1];