Picking text between two words with regex

Question

Picking text between two words with regex

Asked 10 years, 8 months ago

Viewed 3,899 times

3

I would like a light for my problem. My goal is to pick up the list below by dividing the blocks between the words LOREM and LOREM but I don’t want to pick up the whole text that follows at the end of the list pattern, as follows example:

LOREM : 10505050
IPSUM : 1050051051084
DOLOR : 2620620620652
AMETI : 54084840540540
LOREM : 10505050
IPSUM : 1050051051084
DOLOR : 2620620620652
AMETI : 54084840540540
LOREM : 10505050
IPSUM : 1050051051084
DOLOR : 2620620620652
AMETI : 54084840540540
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I’m using this regex: /(?=LOREM :).+?(?:(?=LOREM :))/s I’m able to select all, but the last block of text I can’t select.

To better understand follow this example: https://regex101.com/r/gM2fF1/1

You want to take all occurrences of LOREM : número and Lorem seguindo de qualquer coisa(this is the last block you can’t capture) ?

– rray

2014/12/10 at 18:08
@lost that. I can’t capture the last block. The only one though, is that sometimes some don’t come with just numbers.

– Diego Henrique

2014/12/10 at 19:05

2 answers

1

I suggest two approaches:

#1: By steps

I could divide that text into the "interesting part" and throw away the rest using for example ([^\.]+[\d]+).

Then I’d just stick to the pattern chave : valor and could make a simpler match that would give an array with each line. Something like this:

$regex = '(([\w]+) : ([\d]+))';
preg_match_all($regex, $string, $matches);

#2: Regex grouping catch

It could have a regex that directly captures groups, which implies that its group pattern is consistent. A suggestion is to do so:

$regex = '(([\w]+ : [\d]+[\s\n\r]){4})';
preg_match_all($regex, $string, $matches);

Sergio, the second step would not help me, because I do not know exactly how many lines each group has. There are times that varies from 4 to 5. But I used the first step and then separated the groups by LOREM. There it was. Thank you very much!

– Diego Henrique

2014/12/11 at 11:33
@Diegohenrique ok, great that I could help!

– Sergio

2014/12/11 at 11:36

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by Lollipop • **4,918** points · Answer 1 · 2014-12-10T18:09:29+00:00

1 - Use strlen

It returns the length of a text passed as argument. Example of using the strlen function:

<?php
    /*string strlen (string entrada)*/
    $qtd_char = strlen("Linha");
    echo $qtd_char;
?>

**The value displayed will be "6", because the text "Line of Code" contains six characters.*

2 º With the amount of text purchased use the Replace.

The substr is responsible for returning a piece of the string. For this it uses three parameters: the string itself, the initial index and the number of characters to be returned.

It turns out that we can also use a negative index, so PHP analyzes the string by counting N characters from the end, not from the beginning. Here are some examples:

<?php

$texto = "eu não sou besta pra tirar onda de herói";

echo substr($texto, 0, $qtd_char);  // eu não sou besta

?>

We can also use substr() in combination with strpos(). strpos detects the position a string takes within an expression.