Parser Bbcode ignore what is inside [code]

Question

Parser Bbcode ignore what is inside [code]

Asked 11 years ago

Viewed 261 times

2

I made a bbcode parser based on some to meet my needs, but I have some problems.

The code inside [code] should be ignored, but I don’t know how I could do it, since it has all the other tags that are parsed.

I tried to do it like this, you didn’t answer 100%

$pos = strpos($text, '[code]');
    $code = "";

    if($pos!==false){

        $code = substr($text, $pos, strpos($text, '[/code]')-$pos);
        $text = str_replace($code.'[/code]','',$text);
        $code = substr($code, 6);
    }

Daniel edits your question and leaves only the code relevant to what is being asked.

– Mansueli

2014/08/11 at 16:07
You want to delete all text blocks within the following format: [code]***código***[/code]

– CIRCLE

2014/08/13 at 16:12
@Daniel Lemes my example did not serve? if not, can detail better?

– William Borba

2014/08/13 at 17:57
@CIRCLE yes I want to take what is inside code and save in another variable removing it from the main string.

– Daniel Lemes

2014/08/13 at 18:45

6 answers

4

Perhaps that answer is MUCH more than you need, but in my opinion it is not enough that you only have a Regular Expression or a solution based on the positions of certain characters (even more so because it requires that the input is perfectly normalized).

Therefore, I propose a solution oriented to Objects where a Parser applies as many replacement strategies as you have:

First the file structure:

|-\BBCode
| |-\BBCode\Parser.php
| \-\BBCode\Parsers
|   |-\BBCode\Parsers\Code.php
|   |-\BBCode\Parsers\Emphasis.php
|   |-\BBCode\Parsers\Parser.php
|   \-\BBCode\Parsers\Strong.php
\-\index.php

Bbcode Parser.php is our class of access to different strategies of analysis and substitution:

<?php

namespace BBCode;

class Parser {

    /**
     * Available Parsers
     *
     * @var array parsers
     */
    private $parsers = array();

    /**
     * Input Text (with BBCodes)
     *
     * @var string $text;
     */
    protected $text;

    /**
     * Output Text (parsed)
     *
     * @var string $output;
     */
    protected $output;

    /**
     * Parser Constructor
     * Prepares the text to be parsed
     */
    public function __construct( $text ) {

        // Preparing text

        $text = $this -> prepare( $text );

        $this -> text = $this -> output = $text;
    }

    /**
     * Add new BBCode Parser to be used
     *
     * @param Parsers\Parser $parser
     *  BBCode Parser
     *
     * @return BBCode\Parser
     *  Parser Object (Fluent Interface)
     */
    public function addParser( Parsers\Parser $parser ) {

        $this -> parsers[] = $parser;

        return $this;
    }

    /**
     * Parses BBCodes
     *
     * @return BBCode\Parser
     *  Parser Object (Fluent Interface)
     */
    public function parse() {

        foreach( $this -> parsers as $parser ) {

            $this -> output = $parser -> parse( $this -> output );
        }

        return $this;
    }

    // Accessors

    /**
     * Get output (parsed) text
     *
     * @return string
     *  Parsed text
     */
    public function getText() {
        return $this -> output;
    }

    // Auxiliary Methods

    /**
     * Applies some routines over inout text
     * allowing easier parsing
     *
     * @param string $text
     *  Text to cleanup
     *
     * @return string
     *  Cleaned text
     */
    private function prepare( $text ) {

        // Cleaning trailing spaces

        $text = trim( $text );

        // Removing duplicated spaces

        $text = preg_replace( '/\s{2,}/', ' ', $text );

        return $text;
    }
}

It seems too much just because of the comments, but it’s really very simple. In it we have, besides the properties, of course:

The constructor to receive the input which will be worked for each Parser individual;
A method (Parser::addParser()) through which we can add new strategies of Parsing, all assured with interfaces and polymorphism through the type-hinting.
A method that iterates the collection of Parsers and applies them in batch to the input text
A getter to get the text with Bbcodes replaced by the appropriate tags.

We also have a private method that allows simplifying the possible Regular Expressions of the analysis strategies. I added only two routines: One to clear spaces around the string and one to remove duplicate spaces.

These two routines allow us, for example, not to need edges ( b ), anchors ( and $ ) or the unprintable character bar ( s ).

We then have the classes responsible for the strategies of Parsing:

Strong.php

namespace BBCode\Parsers;

class Strong implements Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text ) {

        $text = $this -> applyParsingRestrictions( $text );

        return preg_replace_callback(

            '/\[b\](.*?)\[\/b\]/',

            function( $matches ) {
                return sprintf( '<strong>%s</strong>', $matches[ 1 ] );
            },

            $text
        );
    }

    // Auxiliary methods

    /**
     * Apply parsing restrictions against nested BBCodes
     *
     * @param string $text
     *  Input Text to analyze
     *
     * @return string
     *  Input text with nested BBCodes striped
     */
    private function applyParsingRestrictions( $text ) {

        if( preg_match( '/((?<=\[code\])\[b\])(.*)(\[\/b\](?=\[\/code\]))/', $text, $matches ) ) {

            $text = str_replace(

                sprintf( '[b]%s[/b]', $matches[ 2 ] ), $matches[ 2 ], $text
            );
        }

        return $text;
    }
}

Emphasis.php

namespace BBCode\Parsers;

class Emphasis implements Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text ) {

        return preg_replace_callback(

            '/\[i\](.*?)\[\/i\]/',

            function( $matches ) {
                return sprintf( '<em>%s</em>', $matches[ 1 ] );
            },

            $text
        );
    }
}

Code.php

<?php

namespace BBCode\Parsers;

class Code implements Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text ) {

        return preg_replace_callback(

            '/\[code\](.*?)\[\/code\]/',

            function( $matches ) {
                return sprintf( '<code>%s</code>', $matches[ 1 ] );
            },

            $text
        );
    }
}

And you can create as many strategies as you need, by having them all implement the method defined in the interface Parsers Parser.php:

<?php

namespace BBCode\Parsers;

interface Parser {

    /**
     * Parses found BBCodes
     *
     * @param string $text
     *  Input text to parse
     */
    public function parse( $text );
}

The substitution routines are almost self-explanatory. It’s a simple regular substitution. I opted for preg_replace_callback() for being more readable.

The cat hop that (finally) contextualizes this answer to the topic question was demonstrated only in the class Strong.php through the method Strong::applyParsingRestrictions().

Before replacing the Bbcodes [b] and [/b] be made by their counter-parties <Strong> and </Strong> a search is made for other Bbcodes that may be encompassing those referring to bold.

I defined only a search, by Bbcode [code]. If the bold Bbcode is found inside a code Bbcode, instead of proceeding to replace it with HTML tags, we remove the Bbcode from the input text.

And the idea is basically the one posted by William Lautert, used lookbacks and lookaheads. We look back looking for the opening of the code Bbcode and look forward to the closing Bbcode, if it finds, remove the bold Bbcodes that exist inside.

And back to the interface method Parsers Parser:parse(), if there is no other occurrence of the bold Bbcode, preg_replace_callback() will not run, returning the stream to the next Parser of the defined collection.

To use all this we have:

<?php

// Autoloading

spl_autoload_register( function( $classname ) {

    $classname = stream_resolve_include_path(

        str_replace( '\\', DIRECTORY_SEPARATOR, $classname ) . '.php'
    );

    if( $classname !== FALSE ) {

        include $classname;
    }
});

$parser = new BBCode\Parser(

    '[code][b]This[/b][/code]       [code][i]is[/i][/code] my [b]text[/b]  !'
);

$parser -> addParser( new BBCode\Parsers\Strong )
        -> addParser( new BBCode\Parsers\Emphasis )
        -> addParser( new BBCode\Parsers\Code );

echo $parser -> parse() -> getText();

?>

And we have a way out;

<code>This</code> <code><em>is</em></code> my <strong>text</strong> !

See the constraint application in action. Our input string has a bold Bbcode inside another code. Because of the restriction, we remove the bold one leaving only the code ones.

This is without prejudice to the boldface Bbcode set later, which works normally.

But look what happened to the italic Bbcode (emphasis). Since no constraint rule was defined, the resulting string had a <in> within a <code>.

I wrote this code now, really fast. I didn’t take abstraction into account to not complicate more than I already complicated. In a real case it is quite interesting to abstract so that the same system accepts both Bbcodes and, who knows, Markdown.

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by Bacco • **93,720** points · Answer 1 · 2014-08-13T19:35:37+00:00

Possible solution:

<?php

   $code = '
   teste[code]123[/code]bla 
   teste[code]456[/code]bla 
   teste[code]789[/code]bla 
   teste[code]xyz[/code]bla
';

   While ( $pos = stripos( ' '.$code, '[code]') ) {
      $left = substr( $code, 0, $pos - 1 );
      $code = substr( $code, $pos + 5 );
      $right = substr( $code, stripos( $code, '[/code]' ) + 7 );

      // Se quiser fazer algo com o código que foi removido faça nesta linha:
      echo htmlentities( 'Removido: '.substr( $code, 0, stripos( $code, '[/code]' ) ) ).'<br>';

      $code = $left.$right;
   }

   echo 'Resultado: '.nl2br( htmlentities( $code ) );

?>

This loop basically removes everything between [code] and [/code] of the original string, including the tags. Some considerations:

If you just want to extract [code] in lower case, change the stripos for strpos;
if you want to do something with the code removed, just use the logic below the comment;
depending on how to process the data, maybe it would be better to ignore the data when presenting it, and not actually removing the string original;
the above code ignores unopened tags; it is the case that you decide whether unopened tag counts until the end of the line, or leave as is.

Upshot:

Removido: 123
Removido: 456
Removido: 789
Removido: 444
Resultado: 
 testebla 
 testebla 
 testebla 
 testebla

by Jader A. Wagner • **4,921** points · Answer 2 · 2014-08-14T15:21:25+00:00

I think that’s about it, just modify the function within the preg_replace_callback for what you need:

$text = "Teste de string com code: [code]<p>teste</p>[/code] e continuação de teste com outro code: [code]<p>teste 2</p>[/code] com mais texto.";

$text = preg_replace_callback('/\[code\](.*?)\[\/code\]/i',
        function ($matches) {
            return ($matches[1] ? '<div class="code">' . htmlspecialchars($matches[1], ENT_COMPAT,'ISO-8859-1') . '</div>' : '');
        }, $text);

echo $text;

// retorno

// Teste de string com code: <div class="code">&lt;p&gt;teste&lt;/p&gt;</div> e continuação de teste com outro code: <div class="code">&lt;p&gt;teste 2&lt;/p&gt;</div> com mais texto.

In this example, the function adds the contents of tags [code] within a div, and converts the HTML characters to display them. The class .code, can be used to format the div with background, border, font, etc...

And find [code][/code], they are removed without creating a div emptiness.

by Guilherme Lautert • **15,097** points · Answer 3 · 2014-08-13T19:02:15+00:00

Daniel, I don’t know if I know what you need right now, but try:

    $str = "texte 1 [code] texte code [/code] texte 2";     

    preg_match('/(?<=\[code\]).*(?=\[\/code\])/', $str, $match);

    $strCode = $match[0];

this ira returns everything inside "[code][/code]"

by Marcos • **1,407** points · Answer 4 · 2014-08-15T13:22:33+00:00

You can do with regular expression like this:

<?php 
function removeBB($texto) { 
    $regex = '|[[\/\!]*?[^\[\]]*?]|si';  
    return preg_replace($regex, '', $texto); 
} 
$s = "[url=http://google.com]Google[/url]"; 
echo removeBB($s); 
?>

by William Borba • **460** points · Answer 5 · 2014-08-13T16:18:14+00:00

This regex can solve the problem.

<?php

$string = 'texto texto la la la [code]<html><iframe><p>codigo html.</p></iframe></html>[/code] continuacao do texto e bla bla bla';
$pattern = '%(\[code\].*\[/code\])%';
$replacement = "${1}[code][/code]";
$response = preg_replace($pattern, $replacement, $string);

print $response;

?>

output text la la [code][/code] continuation of text and bla bla bla bla