Perhaps that answer is MUCH more than you need, but in my opinion it is not enough that you only have a Regular Expression or a solution based on the positions of certain characters (even more so because it requires that the input is perfectly normalized).
Therefore, I propose a solution oriented to Objects where a Parser applies as many replacement strategies as you have:
First the file structure:
|-\BBCode
| |-\BBCode\Parser.php
| \-\BBCode\Parsers
| |-\BBCode\Parsers\Code.php
| |-\BBCode\Parsers\Emphasis.php
| |-\BBCode\Parsers\Parser.php
| \-\BBCode\Parsers\Strong.php
\-\index.php
Bbcode Parser.php is our class of access to different strategies of analysis and substitution:
<?php
namespace BBCode;
class Parser {
/**
* Available Parsers
*
* @var array parsers
*/
private $parsers = array();
/**
* Input Text (with BBCodes)
*
* @var string $text;
*/
protected $text;
/**
* Output Text (parsed)
*
* @var string $output;
*/
protected $output;
/**
* Parser Constructor
* Prepares the text to be parsed
*/
public function __construct( $text ) {
// Preparing text
$text = $this -> prepare( $text );
$this -> text = $this -> output = $text;
}
/**
* Add new BBCode Parser to be used
*
* @param Parsers\Parser $parser
* BBCode Parser
*
* @return BBCode\Parser
* Parser Object (Fluent Interface)
*/
public function addParser( Parsers\Parser $parser ) {
$this -> parsers[] = $parser;
return $this;
}
/**
* Parses BBCodes
*
* @return BBCode\Parser
* Parser Object (Fluent Interface)
*/
public function parse() {
foreach( $this -> parsers as $parser ) {
$this -> output = $parser -> parse( $this -> output );
}
return $this;
}
// Accessors
/**
* Get output (parsed) text
*
* @return string
* Parsed text
*/
public function getText() {
return $this -> output;
}
// Auxiliary Methods
/**
* Applies some routines over inout text
* allowing easier parsing
*
* @param string $text
* Text to cleanup
*
* @return string
* Cleaned text
*/
private function prepare( $text ) {
// Cleaning trailing spaces
$text = trim( $text );
// Removing duplicated spaces
$text = preg_replace( '/\s{2,}/', ' ', $text );
return $text;
}
}
It seems too much just because of the comments, but it’s really very simple. In it we have, besides the properties, of course:
- The constructor to receive the input which will be worked for each Parser individual;
- A method (Parser::addParser()) through which we can add new strategies of Parsing, all assured with interfaces and polymorphism through the type-hinting.
- A method that iterates the collection of Parsers and applies them in batch to the input text
- A getter to get the text with Bbcodes replaced by the appropriate tags.
We also have a private method that allows simplifying the possible Regular Expressions of the analysis strategies. I added only two routines: One to clear spaces around the string and one to remove duplicate spaces.
These two routines allow us, for example, not to need edges ( b ), anchors ( and $ ) or the unprintable character bar ( s ).
We then have the classes responsible for the strategies of Parsing:
Strong.php
namespace BBCode\Parsers;
class Strong implements Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text ) {
$text = $this -> applyParsingRestrictions( $text );
return preg_replace_callback(
'/\[b\](.*?)\[\/b\]/',
function( $matches ) {
return sprintf( '<strong>%s</strong>', $matches[ 1 ] );
},
$text
);
}
// Auxiliary methods
/**
* Apply parsing restrictions against nested BBCodes
*
* @param string $text
* Input Text to analyze
*
* @return string
* Input text with nested BBCodes striped
*/
private function applyParsingRestrictions( $text ) {
if( preg_match( '/((?<=\[code\])\[b\])(.*)(\[\/b\](?=\[\/code\]))/', $text, $matches ) ) {
$text = str_replace(
sprintf( '[b]%s[/b]', $matches[ 2 ] ), $matches[ 2 ], $text
);
}
return $text;
}
}
Emphasis.php
namespace BBCode\Parsers;
class Emphasis implements Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text ) {
return preg_replace_callback(
'/\[i\](.*?)\[\/i\]/',
function( $matches ) {
return sprintf( '<em>%s</em>', $matches[ 1 ] );
},
$text
);
}
}
Code.php
<?php
namespace BBCode\Parsers;
class Code implements Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text ) {
return preg_replace_callback(
'/\[code\](.*?)\[\/code\]/',
function( $matches ) {
return sprintf( '<code>%s</code>', $matches[ 1 ] );
},
$text
);
}
}
And you can create as many strategies as you need, by having them all implement the method defined in the interface Parsers Parser.php:
<?php
namespace BBCode\Parsers;
interface Parser {
/**
* Parses found BBCodes
*
* @param string $text
* Input text to parse
*/
public function parse( $text );
}
The substitution routines are almost self-explanatory. It’s a simple regular substitution. I opted for preg_replace_callback() for being more readable.
The cat hop that (finally) contextualizes this answer to the topic question was demonstrated only in the class Strong.php through the method Strong::applyParsingRestrictions().
Before replacing the Bbcodes [b] and [/b] be made by their counter-parties <Strong> and </Strong> a search is made for other Bbcodes that may be encompassing those referring to bold.
I defined only a search, by Bbcode [code]. If the bold Bbcode is found inside a code Bbcode, instead of proceeding to replace it with HTML tags, we remove the Bbcode from the input text.
And the idea is basically the one posted by William Lautert, used lookbacks and lookaheads. We look back looking for the opening of the code Bbcode and look forward to the closing Bbcode, if it finds, remove the bold Bbcodes that exist inside.
And back to the interface method Parsers Parser:parse(), if there is no other occurrence of the bold Bbcode, preg_replace_callback() will not run, returning the stream to the next Parser of the defined collection.
To use all this we have:
<?php
// Autoloading
spl_autoload_register( function( $classname ) {
$classname = stream_resolve_include_path(
str_replace( '\\', DIRECTORY_SEPARATOR, $classname ) . '.php'
);
if( $classname !== FALSE ) {
include $classname;
}
});
$parser = new BBCode\Parser(
'[code][b]This[/b][/code] [code][i]is[/i][/code] my [b]text[/b] !'
);
$parser -> addParser( new BBCode\Parsers\Strong )
-> addParser( new BBCode\Parsers\Emphasis )
-> addParser( new BBCode\Parsers\Code );
echo $parser -> parse() -> getText();
?>
And we have a way out;
<code>This</code> <code><em>is</em></code> my <strong>text</strong> !
See the constraint application in action. Our input string has a bold Bbcode inside another code. Because of the restriction, we remove the bold one leaving only the code ones.
This is without prejudice to the boldface Bbcode set later, which works normally.
But look what happened to the italic Bbcode (emphasis). Since no constraint rule was defined, the resulting string had a <in> within a <code>.
I wrote this code now, really fast. I didn’t take abstraction into account to not complicate more than I already complicated. In a real case it is quite interesting to abstract so that the same system accepts both Bbcodes and, who knows, Markdown.
Daniel edits your question and leaves only the code relevant to what is being asked.
– Mansueli
You want to delete all text blocks within the following format:
[code]***código***[/code]
– CIRCLE
@Daniel Lemes my example did not serve? if not, can detail better?
– William Borba
@CIRCLE yes I want to take what is inside code and save in another variable removing it from the main string.
– Daniel Lemes