PHP syntax Highlighter - DOM object does not update HTML

Asked

Viewed 45 times

2

I’ve been developing a code to do syntax Highlighting, but the code is failing to update HTML.

The code is extensive, and unfortunately all of it is necessary.

I have a main file called highlight.php:

<?php

final class highlight {

    private static $langs = array();
    private static $exts = array();
    private static $default_replace = array(
            'tag'=>'span',
            'text'=>'$1'
        );

    static function highlight_string($lang, $code) {
        if(isset(self::$langs[$lang]))
        {
            $lang_defs = &self::$langs[$lang];

            $dom = new DOMDocument('1.0', 'utf-8');

            $element = $dom->createElement('code', $code);

            $element->setAttribute('class','highlight '.$lang);

            $dom->appendChild($element);

            foreach($lang_defs as $k=>&$lang_def)
            {
                $html = '';

                while($child = &$element->firstChild)
                {

                    if($child->nodeType === 3)
                    {

                        if(!isset($lang_def['replace']))
                        {
                            $lang_def['replace'] = self::$default_replace;
                        }

                        switch(gettype($lang_def['replace']))
                        {
                            case 'string':
                                $html .= preg_replace(
                                        $lang_def['match'],
                                        $lang_def['replace'],
                                        $child->nodeValue
                                    );
                                break;
                            case 'array':
                                $html .= preg_replace(
                                        $lang_def['match'],
                                        '<' . $lang_def['replace']['tag'] .
                                        ' class="' . $lang_def['class'] . '">' .
                                            $lang_def['replace']['text'] .
                                        '</' . $lang_def['replace']['tag'] . '>',
                                        $child->nodeValue
                                    );
                                break;
                            case 'object':
                                $html .= preg_replace_callback(
                                        $lang_def['match'],
                                        $lang_def['replace'],
                                        $child->nodeValue
                                    );
                                break;

                        }
                    }
                    else
                    {
                        $html .= $child->nodeValue;
                    }

                    $element->removeChild($child);

                }

                if($html)
                {
                    $fragment = $dom->createDocumentFragment();

                    $fragment->appendXML(isset($lang_def['patch'])?$lang_def['patch']($html):$html);

                    $element->appendChild($fragment);

                    if($element->nodeValue == $code)
                    {
                        trigger_error('Syntax highlight failed on the rule no. '.$k.', for the language '.$lang,E_USER_WARNING);

                        return false;
                    }

                    echo '<br>',htmlentities($dom->saveXML()),'<br>',var_dump($element);
                }
            }

            //removes the xml declaration
            return trim(
                    str_replace(
                        '<?xml version="1.0" encoding="utf-8"?>',
                        '',
                        $dom->saveXML()
                    ),
                    "\r\n"
                );

        }
        else
        {
            return false;
        }
    }

    static function highlight_file($file) {

        if(@is_file($file))
        {
            if(preg_match('@(?P<file>.*)\.(?P<ext>[^\.]*)$@', $file, $name) && isset(self::$langs[$name['ext']]))
            {
                return self::highlight_string(self::$langs[$name['ext']], file_get_contents($file));
            }
        }
        else
        {
            return false;
        }
    }

    static function add_lang($lang, $defs){
        switch(gettype($defs)){
            case 'string':
                $defs = (array)include($defs);
                if( $defs === array() )
                {
                    return false;
                }
            case 'array':

                self::$langs[$lang] = $defs['lang'];

                foreach( $defs['exts'] as $ext)
                {
                    self::$exts[$ext] = $lang; 
                }

                break;
            default:
                return false;
        }
        return true;
    }

    static function lang_loaded($lang) {
        return isset(self::$langs[$lang]);
    }

};

This is the file where the fault is. Constantly runs the function trigger_error().
The function runs when the html code is the same as the original code, indicating that the function failed.

Settings are loaded into separate files, an example for the file sql.php:

<?php
    return array(
        'exts'=>array('sql'),
        'lang'=>array(

            array(
                'class'=>'string',
                'match'=>'/([bn]?"(?:[^"]|[\\"]")*"|[bn]?\'(?:[^\']|[\\\']\')*\')(?=[\b\s\(\),;\$#\+\-\*\/]|$)/'
            ),
            array(
                'class'=>'comment',
                'match'=>'/((?:\/\/|\-\-\s|#)[^\r\n]*|\/\*(?:[^*]|\*[^\/])*(?:\*\/|$))/',
                'patch'=>function($html){
                    //step one: try to fix the spans
                    $html = preg_replace(
                            '/((?:\/\/|\-\-\s|#)[^\r\n]*|\/\*(?:[^*]|\*[^\/])*(?:\*\/|$))/',
                            '$1</span>',
                            $html
                        );

                    //step 2: fix single-line comments (-- and #)
                    $html = preg_replace_callback(
                            '/<span class="comment">((?:#|-- |\/\/)(?:.|<\/span><span class="[^"]+">([^<])<\/span>)*)([\r\n]|$)/',
                            function($matches){
                                return '<span class="comment">'.
                                    //cleans up all spans
                                    preg_replace(
                                            '/<\/?span(?: class="[^"]+")?>/',
                                            $matches[1].$matches[2],
                                            ''
                                        ).'</span>'.$matches[3];
                            },
                            $html
                        );

                    //step 3: fix multi-line comments
                    return preg_replace_callback(
                            '/<span class="comment">(\/\*(?:[^*]|\*[^\/])+(?:\*\/(?:<\/span>)?|$))/',
                            function($matches){
                                return '<span class="comment">'.
                                    //cleans up all spans
                                    preg_replace(
                                        '/<\/?span(?: class="[^"]+")?>/',
                                        $matches[1],
                                        ''
                                    ).'</span>';
                            },
                            $html
                        );
                }
            ),
            array(
                /*
                 * numbers aren't that 'regular' and many edge-cases were left behind    
                 * with the help of @MLM (http://stackoverflow.com/users/796832/mlm),    
                 * we were able to make this work.    
                 * he took over the regex and patched it all up, I did the replace string    
                 */
                'match'=>'/((?:^|\b|\(|\s|,))(?![a-z_]+)([+\-]?\d+(?:\.\d+)?(?:[eE]-?\d+)?)((?=$|\b|\s|\(|\)|,|;))/',
                'replace'=>'$1<span class="number">$2</span>$3'
            ),
            array(
                'class'=>'name',
                'match'=>'/(`[^`]+`)/'
            ),
            array(
                'class'=>'var',
                'match'=>'/(@@?[a-z_][a-z_\d]*)/'
            ),
            array(
                'class'=>'keyword',
                //the keyword replace must have an aditional check (`(?!\()` after the name), due to the function replace()
                'match'=>'/\b(accessible|add|all|alter|analyze|and|as|asc|asensitive|before|between|bigint|binary|blob|both|by|call|cascade|case|change|char|character|check|collate|column|condition|constraint|continue|convert|create|cross|current_date|current_time|current_timestamp|current_user|cursor|database|databases|day_hour|day_microsecond|day_minute|day_second|dec|decimal|declare|default|delayed|delete|desc|describe|deterministic|distinct|distinctrow|div|double|drop|dual|each|else|elseif|enclosed|escaped|exists|exit|explain|false|fetch|float|float4|float8|for|force|foreign|from|fulltext|generated|get|grant|group|having|high_priority|hour_microsecond|hour_minute|hour_second|if|ignore|in|index|infile|inner|inout|insensitive|insert|int|int1|int2|int3|int4|int8|integer|interval|into|io_after_gtids|io_before_gtids|is|iterate|join|key|keys|kill|leading|leave|left|like|limit|linear|lines|load|localtime|localtimestamp|lock|long|longblob|longtext|loop|low_priority|master_bind|master_ssl_verify_server_cert|match|maxvalue|mediumblob|mediumint|mediumtext|middleint|minute_microsecond|minute_second|mod|modifies|natural|nonblocking|not|no_write_to_binlog|null|numeric|on|optimize|optimizer_costs|option|optionally|or|order|out|outer|outfile|parse_gcol_expr|partition|precision|primary|procedure|purge|range|read|reads|read_write|real|references|regexp|release|rename|repeat|replace(?!\()|require|resignal|restrict|return|revoke|right|rlike|schema|schemas|second_microsecond|select|sensitive|separator|set|show|signal|smallint|spatial|specific|sql|sqlexception|sqlstate|sqlwarning|sql_big_result|sql_calc_found_rows|sql_small_result|ssl|starting|stored|straight_join|table|terminated|then|tinyblob|tinyint|tinytext|to|trailing|trigger|true|undo|union|unique|unlock|unsigned|update|usage|use|using|utc_date|utc_time|utc_timestamp|values|varbinary|varchar|varcharacter|varying|virtual|when|where|while|with|write|xor|year_month|zerofill)\b/i'
            ),
            array(
                'class'=>'func',
                'match'=>'/\b([a-z_][a-z_\d]*)\b(?=\()/i'
            ),
            array(
                'class'=>'name',
                'match'=>'/\b([a-z\_][a-z_\d]*)\b/i'
            )
        )
    );

All this is called in the file index.php:

<style>
.highlight, .highlight *{
    background:black;
    color:white;
    font-family:'Consolas',monospace;
    font-size:16px;
    word-wrap: break-word;
    /*forces whitespace to stay there*/
    white-space: pre;
}

.highlight.sql .keyword{color:teal;}
.highlight.sql .string{color:red;}
.highlight.sql .func{color:purple;}
.highlight.sql .number{color:#0F0;}
.highlight.sql .name{color:olive;}
.highlight.sql .var{color:green;}
.highlight.sql .comment{color:gray;}
</style>

Testing highlight of a string:

<?php

    include('highlight.php');

    highlight::add_lang('sql','lang/sql.php');

    echo highlight::highlight_string('sql','select 1,"2";');

?>

What should I do echo of the following html:

<code class="highlight sql"><span class="keyword">select</span> <span class="number">1</span>,<span class="string">"2"</span>;</code>

Therefore, the final file structure is as follows:

/
|--- index.php
|--- highlight.php
|--- /lang/
     |--- sql.php
     |--- ...

I’ve tried a lot of experiments and they’ve all failed.

One of them was the following code:

<?

$dom = new DOMDocument('1.0', 'utf-8');

$element = $dom->createElement('code', 'select 1,"2","test"; #test');

$element->setAttribute('class','highlight');

$dom->appendChild($element);

while($element->childNodes->length){
    $element->removeChild($element->firstChild);
}

$fragment = $dom->createDocumentFragment();

$fragment->appendXML('<span class="dest">select</span> 1,"2","test"; ');

$element->appendChild($fragment);

echo '<br>',htmlentities($dom->saveXML());

That works! And it’s quite similar!
The code is correctly generated, but the HTML of the tag is never updated code.

What am I doing wrong in the code?

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.