Take a specific part of a string

Asked

Viewed 803 times

5

I need to take a specific part of a log file, its structure is something like this:

##################################################
----------------------------------------
Nome: nome_user,
Email: [email protected],
-------------------------------
,
----------------------------------------
##################################################

Let’s say I need to take only value country Email, how can I do this?

  • Always has Email: before and the comma after?

  • Yes, that’s the structure

  • And it also has more fields before and after, but I only put these 2 (name,email) to show the structure of the file.

  • is an email field in the whole file, or are several?

  • Only a field written "Email', the others are other information, only to confirm that the file is larger.

1 answer

7


You can do it in a very simple way:

$start = 'Email:';
$end = ',';

$pos1 = strpos( $log, $start );
$pos2 = strpos( $log, $end, $pos1 );
$block = substr(
   $log, $pos1 + strlen( $start ),
   $pos2 - $pos1 - strlen( $start )
);

See working on IDEONE.

Of course, for the specific case it is possible to write in a much more summarized way:

$pos1 = strpos( $log, 'Email:' );
$pos2 = strpos( $log, ',', $pos1 );
$block = substr( $log, $pos1 + 6, $pos2 - $pos1 - 6 );

If you need to test cases where there is no field Email: in the log:

$pos1 = strpos( $log, $start );
if( $pos1 === false ) die( 'Campo não encontrado' ); // ou return ''; se usar em função.
...


Creating a function:

In general, you can have a function to extract the data you want. There are a thousand ways to do it, this is one of them:

function my_extract( $text, $start, $end ) {
    $pos1 = strpos( $text, $start );
    if( false === $pos1 ) return 'Não encontrado';
    $pos1 += strlen( $start );
    $pos2 = strpos( $text, $end, $pos1 );
    return trim( substr( $text, $pos1, $pos2 - $pos1 ) );
}

Mode of use:

$nome  = my_extract( $log, 'Nome:' , ',' );
$email = my_extract( $log, 'Email:', ',' );

See demonstration on IDEONE.


Using cannon to kill dove:

Since fatally someone would end up posting, follows a solution with Regex:

if( preg_match( '/Email:\s*(.*)\s*,/', $log, $matches ) ) {
    $email = $matches[1];
} else {
    $email = ''; // Não encontrado
}
// podia ser um operador ternário, mas não é o foco da pergunta,
// não ajuda na leitura do código e não ajuda na performance.

Again, see working on IDEONE.

If only to find an occurrence of string, do not recommend. It seems simple, but internally the function does a lot more than you need for the proposed problem.

Description of the regular expression:

 /                 /    delimitadores
  Email:                string procurada
        \s*    \s*      espaços em branco
           (.*)         grupo que queremos retornar (quaisquer caracteres)
                  ,     marcador do final


Extra considerations:

  • If you want to use accentuated strings in the future, such as "Profession:", and the encoding of your text is multibyte (as UTF-8) for example, instead of strpos use mb_strpos and configure your PHP for encoding correct.

  • As mentioned by fellow @lvcs, in case you have any situation where you want to locate both Email as EMAIL or eMaIl, can change the strpos for stripos, or the mb_strpos for mb_stripos

  • In case you want to insensitive search with Regex, you have to add the flag i at the end of the expression (add a i after the last bar).

  • If you really want to ensure that the string Email: not be confused with something in the middle of the line, can specify "$Email:" as a marker to include the line break in the query, and the flag m for multiline research.

  • Thank you, I’ll test.

  • Just a hint, use stripos, to ignore case sensitive, because it will have a field with email and not Email.

  • 2

    @lvcs I imagine that since the log is generated by a script, and not typed, it is not supposed to happen this ever, but it is worth as an observation if it is used in another context. It would be the case to use mb_str(i)pos also if you use with UTF-8 and accented strings as well. Maybe I’ll edit it later, and add those two remarks.

  • 2

    @lvcs I edited and added some remarks at the end, including your recommendation.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.