Get content between tags [x] and [/x] with Regular Expression

Question

Get content between tags [x] and [/x] with Regular Expression

Asked 8 years, 6 months ago

Viewed 111 times

1

My question is the following, I have the content below that comes from a table in a database and I would like to use a regular expression or if you have something better, to separate only the content from within the brackets.

[pt-br]

Qual is Lorem Ipsum?

Lorem Ipsum is simply print dummy text and composition industry. Lorem Ipsum has been the standard text of the manikin industry since the 1500s, when an unknown printer took a type galley and scrambled it to make a specimen type book. It survived not only five centuries, but also the leap to electronic composition, remaining essentially unchanged. It was popularized in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and most recently with software publishing as Aldus Pagemaker, including versions of Lorem Ipsum.

[/pt-br]

[en-us]

What is Lorem Ipsum?

Lorem Ipsum is Simply dummy text of the Printing and typesetting Industry. Lorem Ipsum has been the Industry’s standard dummy text Ever Since the 1500s, when an Unknown Printer Took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the Leap into Electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset Sheets containing Lorem Ipsum Passages, and more recently with desktop Publishing software like Aldus Pagemaker including versions of Lorem Ipsum.

[/en-us]

http://answall.com/questions/121285/como-pegar-as-strings-que-estao-entre-colchetes

– user60252

2017/02/01 at 23:36
I believe that this "inside the brackets" is the content between the tags [pt-br] and [/pt-br], but the idea remains the same.

– Woss

2017/02/01 at 23:44
good, the content is [en] content here [/en] so on.

– Osvaldo Queta

2017/02/01 at 23:48

4 answers

1

<?php

$texto = "[pt-br]

Qual é Lorem Ipsum?

Lorem Ipsum é simplesmente texto manequim da impressão e composição 
indústria. Lorem Ipsum tem sido o texto padrão do manequim da indústria
desde os anos 1500, quando uma impressora desconhecida tomou uma galera 
de tipo e mexidos-lo para fazer um livro tipo espécime. Ele sobreviveu 
não apenas cinco séculos, mas também o salto para composição eletrônica,
permanecendo essencialmente inalterado. Foi popularizado na década de 1960
com o lançamento de folhas Letraset contendo Lorem Ipsum passagens, e mais 
recentemente com software de editoração como Aldus PageMaker, incluindo
versões de Lorem Ipsum.

[/pt-br]";

$output = array();
preg_match_all("/\[(.*?)\]/", $texto, $output);
$texto = str_replace($output[0],'', $texto);
echo $texto;

_Examples:

Separating into one array:

$output = array();
preg_match_all("/\[(.*?)\]/", $texto, $output);
$result = array();
for($i = 0; $i < count($output[0]); $i = $i + 2)
{
    $ini = strripos($texto, $output[0][$i]);    
    $end = strripos($texto, $output[0][$i+1]);
    $result[str_replace(['[',']'],'',$output[0][$i])] = 
        str_replace($output[0],'', substr($texto, $ini, $end-$ini)); 

}

var_dump($result);

_Example:

Ideone 3

that reg there will catch the [/en] too? Because here did not catch.

– Asura Khan

2017/02/02 at 00:44
@Asurakhan yes caught look at the examples please!

– novic

2017/02/02 at 00:45
1

Ahh is now yes.

– Asura Khan

2017/02/02 at 00:46
@Osvaldoqueta I made an edit, please check.

– novic

2017/02/02 at 01:43
What a mistake @Osvaldoqueta

– novic

2017/02/02 at 02:41
@Osvaldoqueta has no problem any code is that same, the <p> will not cause any problems, I just tested! look the link

– novic

2017/02/02 at 02:45
What’s wrong @Osvaldoqueta?

– novic

2017/02/02 at 02:53
@Osvaldoqueta if you are printing the value as? if you are using var_dump Life is wrong the way to program;

– novic

2017/02/02 at 03:02
Face is programming wrong. It’s with echo. Ta vendo !!! @Osvaldoqueta

– novic

2017/02/02 at 03:08
@Osvaldoqueta the text is not as in the question, no pattern can happen this.

– novic

2017/02/02 at 03:15
Paste the text that is saved there for me see @Osvaldoqueta type https://jsfiddle.net/ in the saved html part and send the link let me see the current pattern?

– novic

2017/02/02 at 03:18
Blz, I’ll do it! the text is saved in db from a WYSIWYG editor! and maybe that’s why!

– Osvaldo Queta

2017/02/02 at 03:25
jsfiddle link: https://jsfiddle.net/8mn6eeq8/

– Osvaldo Queta

2017/02/02 at 03:27
already THANK YOU @Virgilio Novic has helped me well, gave me a great light on how to proceed

– Osvaldo Queta

2017/02/02 at 03:28
@Osvaldoqueta this is. The lack of pattern and your doubt in the question is resolved. As you mounted a question with a pattern have to follow in the answers the same. Ok? The text of the link is without pattern

– novic

2017/02/02 at 03:31
@Osvaldogueta then signal as answer to your question and if you can delete the comments is better

– novic

2017/02/02 at 11:45

Show 11 more comments

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by Lucas Kauer • **717** points · Answer 1 · 2017-02-02T00:53:39+00:00

0

Try this regex (?<=\[\w{2}\-\w{2}\])(\.*)(?=\[\/\w{2}\-\w{2}\]).

by Alison Silva • 1 point · Answer 2 · 2017-02-02T00:58:03+00:00

You can use the following code that will return either to [en] or to [en-us], or any other type of value between square Brackets.

$re = '/\[[^]]+\]([^[]+)\[\/[^]]+\]/is';
$str = '[pt-br]

Qual é Lorem Ipsum?

Lorem Ipsum é simplesmente texto manequim da impressão e composição indústria. Lorem Ipsum tem sido o texto padrão do manequim da indústria desde os anos 1500, quando uma impressora desconhecida tomou uma galera de tipo e mexidos-lo para fazer um livro tipo espécime. Ele sobreviveu não apenas cinco séculos, mas também o salto para composição eletrônica, permanecendo essencialmente inalterado. Foi popularizado na década de 1960 com o lançamento de folhas Letraset contendo Lorem Ipsum passagens, e mais recentemente com software de editoração como Aldus PageMaker, incluindo versões de Lorem Ipsum.

[/pt-br]';

preg_match($re, $str, $matches);

// Retorna os valores encontrado
print_r($matches);

In your case you will find 2 groups, and in the second there is only the text. That is, you must access the second position of the array.

by Guilherme Lautert • **15,097** points · Answer 3 · 2017-02-02T15:39:21+00:00

The ideal when capturing markup content is to have the markup pattern in a group.

Example

Dialing [pt-br][/pt-br], the standard is the pt-br.
Dialing [en-us][/en-us], the standard is the en-us.

Why is that in a group? Because then you use the capture itself to identify the end of the markup.

Resolution

~\[([^]]+?)\](.*)\[/\1\]~s

Explanation

\[([^]]+?)\] - We have the beginning of the appointment that should start with [ and end with ] where we apply the above mentioned group rule.
(.*) - Captures anything, remembering that as we have the modifier s, includes the \n.
\[/\1\] - Here is where you guarantee that you will stop at the occurrence of the final marking, because you must capture [/ + marking already captured in group 1 (\1) + ]

Problems

Follows the same idea of HTML and the ideal to handle these cases is a parser, not regex.

See on REGEX101