Get content between tags [x] and [/x] with Regular Expression

Asked

Viewed 111 times

1

My question is the following, I have the content below that comes from a table in a database and I would like to use a regular expression or if you have something better, to separate only the content from within the brackets.

[pt-br]

Qual is Lorem Ipsum?

Lorem Ipsum is simply print dummy text and composition industry. Lorem Ipsum has been the standard text of the manikin industry since the 1500s, when an unknown printer took a type galley and scrambled it to make a specimen type book. It survived not only five centuries, but also the leap to electronic composition, remaining essentially unchanged. It was popularized in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and most recently with software publishing as Aldus Pagemaker, including versions of Lorem Ipsum.

[/pt-br]


[en-us]

What is Lorem Ipsum?

Lorem Ipsum is Simply dummy text of the Printing and typesetting Industry. Lorem Ipsum has been the Industry’s standard dummy text Ever Since the 1500s, when an Unknown Printer Took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the Leap into Electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset Sheets containing Lorem Ipsum Passages, and more recently with desktop Publishing software like Aldus Pagemaker including versions of Lorem Ipsum.

[/en-us]

  • http://answall.com/questions/121285/como-pegar-as-strings-que-estao-entre-colchetes

  • I believe that this "inside the brackets" is the content between the tags [pt-br] and [/pt-br], but the idea remains the same.

  • good, the content is [en] content here [/en] so on.

4 answers

1


<?php

$texto = "[pt-br]

Qual é Lorem Ipsum?

Lorem Ipsum é simplesmente texto manequim da impressão e composição 
indústria. Lorem Ipsum tem sido o texto padrão do manequim da indústria
desde os anos 1500, quando uma impressora desconhecida tomou uma galera 
de tipo e mexidos-lo para fazer um livro tipo espécime. Ele sobreviveu 
não apenas cinco séculos, mas também o salto para composição eletrônica,
permanecendo essencialmente inalterado. Foi popularizado na década de 1960
com o lançamento de folhas Letraset contendo Lorem Ipsum passagens, e mais 
recentemente com software de editoração como Aldus PageMaker, incluindo
versões de Lorem Ipsum.

[/pt-br]";

$output = array();
preg_match_all("/\[(.*?)\]/", $texto, $output);
$texto = str_replace($output[0],'', $texto);
echo $texto;

Examples:


Separating into one array:

$output = array();
preg_match_all("/\[(.*?)\]/", $texto, $output);
$result = array();
for($i = 0; $i < count($output[0]); $i = $i + 2)
{
    $ini = strripos($texto, $output[0][$i]);    
    $end = strripos($texto, $output[0][$i+1]);
    $result[str_replace(['[',']'],'',$output[0][$i])] = 
        str_replace($output[0],'', substr($texto, $ini, $end-$ini)); 

}

var_dump($result);

Example:

  • that reg there will catch the [/en] too? Because here did not catch.

  • @Asurakhan yes caught look at the examples please!

  • 1

    Ahh is now yes.

  • @Osvaldoqueta I made an edit, please check.

  • What a mistake @Osvaldoqueta

  • @Osvaldoqueta has no problem any code is that same, the <p> will not cause any problems, I just tested! look the link

  • What’s wrong @Osvaldoqueta?

  • @Osvaldoqueta if you are printing the value as? if you are using var_dump Life is wrong the way to program;

  • Face is programming wrong. It’s with echo. Ta vendo !!! @Osvaldoqueta

  • @Osvaldoqueta the text is not as in the question, no pattern can happen this.

  • Paste the text that is saved there for me see @Osvaldoqueta type https://jsfiddle.net/ in the saved html part and send the link let me see the current pattern?

  • Blz, I’ll do it! the text is saved in db from a WYSIWYG editor! and maybe that’s why!

  • jsfiddle link: https://jsfiddle.net/8mn6eeq8/

  • already THANK YOU @Virgilio Novic has helped me well, gave me a great light on how to proceed

  • @Osvaldoqueta this is. The lack of pattern and your doubt in the question is resolved. As you mounted a question with a pattern have to follow in the answers the same. Ok? The text of the link is without pattern

  • @Osvaldogueta then signal as answer to your question and if you can delete the comments is better

Show 11 more comments

0

Try this regex (?<=\[\w{2}\-\w{2}\])(\.*)(?=\[\/\w{2}\-\w{2}\]).

0

You can use the following code that will return either to [en] or to [en-us], or any other type of value between square Brackets.

$re = '/\[[^]]+\]([^[]+)\[\/[^]]+\]/is';
$str = '[pt-br]

Qual é Lorem Ipsum?

Lorem Ipsum é simplesmente texto manequim da impressão e composição indústria. Lorem Ipsum tem sido o texto padrão do manequim da indústria desde os anos 1500, quando uma impressora desconhecida tomou uma galera de tipo e mexidos-lo para fazer um livro tipo espécime. Ele sobreviveu não apenas cinco séculos, mas também o salto para composição eletrônica, permanecendo essencialmente inalterado. Foi popularizado na década de 1960 com o lançamento de folhas Letraset contendo Lorem Ipsum passagens, e mais recentemente com software de editoração como Aldus PageMaker, incluindo versões de Lorem Ipsum.

[/pt-br]';

preg_match($re, $str, $matches);

// Retorna os valores encontrado
print_r($matches);

In your case you will find 2 groups, and in the second there is only the text. That is, you must access the second position of the array.

0

The ideal when capturing markup content is to have the markup pattern in a group.

Example

  • Dialing [pt-br][/pt-br], the standard is the pt-br.
  • Dialing [en-us][/en-us], the standard is the en-us.

Why is that in a group? Because then you use the capture itself to identify the end of the markup.

Resolution

~\[([^]]+?)\](.*)\[/\1\]~s

Explanation

  • \[([^]]+?)\] - We have the beginning of the appointment that should start with [ and end with ] where we apply the above mentioned group rule.
  • (.*) - Captures anything, remembering that as we have the modifier s, includes the \n.
  • \[/\1\] - Here is where you guarantee that you will stop at the occurrence of the final marking, because you must capture [/ + marking already captured in group 1 (\1) + ]

Problems

  • Follows the same idea of HTML and the ideal to handle these cases is a parser, not regex.

See on REGEX101

  • well, explained thanks! even you all helped me a lot and accepted the suggestions of all of you, THANKS EVEN

  • Hello @Guilherme Lautert you said in your post that: "the ideal to treat these cases is a parser, not regex." can you exemplify? how would you treat this problem gives an example to me please.

  • @Here I explain the question of parses, which would enter your case. What I want to say is that it would have to be done via programming and not REGEX, because it would not be possible to tag them inside the other.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.