Return all CSS classes with Regular Expression

Asked

Viewed 727 times

5

I need to return all the classes found within a CSS string, so that when the expression conflicts with:

div.classe1{/*...*/}
.classe2 div a{/*...*/}
.classe3.classe4{/*...*/}
.classe5{/*...*/}

Return in an array (with or without the points before the class name, whatever):

["classe1","classe2","classe3","classe4","classe5"]

What I’ve tried so far is this code:

\.(-?[_a-zA-Z]+[_a-zA-Z0-9-]*)(?![^\{]*\})

But apparently it didn’t work very well...

Follows a FIDDLE for better understanding.

  • 1

    Does it have to be with regular expressions? Only one appears /* a classe .azul deixa o texto #123 */ in the code and many of them fail.

  • How will this regular expression be used? With a single match as in the example fiddle, or can one change the code? Regex is not the ideal tool in this case, but depending on the limitations you can think of something...

  • @mgibsonbr Any javascript solution is valid, no matter how it is implemented.

5 answers

5


My suggestion is to use a parser complete CSS, such as JSCSSP, and extract the classes from the individual selectors (instead of the entire CSS text).

function extrairClasses(css) {
    var classes = [];

    var parser = new CSSParser();
    var sheet = parser.parse(css, false, true);
    for ( var i = 0 ; i < sheet.cssRules.length ; i++ ) {
        var seletor = sheet.cssRules[i].mSelectorText;
        if ( seletor )
            classes = classes.concat(seletor.match(/\.\w+/g));
    }

    return classes;
}

Example in jsFiddle. Note that it works even in the presence of "degenerate" cases, as a comment containing " or a string containing /* (and both containing .classe).

1

Follow implementation in pure JS:

var texto = "div.classe1{background:red}.classe2 div a{background:#00f}.classe3.classe4{background:green}.classe5{background:#ff0}";

var retorno = texto.match(/\.(-?[_a-zA-Z]+[_a-zA-Z0-9-]*)(?![^\{]*\})/ig); //[".classe1", ".classe2", ".classe3", ".classe4", ".classe5"]

Note the inclusion of /ig at the end that searches the entire string (not just the first occurrence) in case insensitive.

  • you can replace [_a-zA-Z] for [_A-z] which makes case incentive and gets smaller the size of regexp. :)

  • @Gabrielgartz A-z? This is news to me. Can demonstrate working?

  • @Bacco tested by http://regexpal.com/ and it really works: . (-? [_A-z]+[_A-Z0-9-])(?![^{]})

  • @Gabrielgartz thanks for the tip!

  • @Tiagocésaroliveira beware that the tip is wrong.

  • @Bacco’s true, but in this case I didn’t need the directive. I do not know if this is an advantage (I see that it is, because there may be passages in which the case insensitive is not desired)

  • 2

    @Tiagocésaroliveira I don’t think you understand, A-z is wrong because it takes clas[se1 { ... } and a lot of crap, because between the Z capital letter and the a lowercase has special characters, and A-z includes all of these. It works only in the sense of picking up what you want in your test, but picks up unwanted stuff as well. Backslash, brackets etc.

  • @Bacco true! I just tested.

  • 2

    To be more exact, there’s this among Z and a: `[ \ ] ^ _ `` - See an ASCII table

Show 4 more comments

1

Thinking to address the issue of comments and hacks starting with dot, I thought of the following function:

function parseClasses(cssSource) {

    //remove comentários
    var semComentarios = cssSource.replace(/\/\*([\s\S]*?)\*\//g, '');

    //remove blocos com a formatação
    var semBlocos = semComentarios.replace(/{\s*[^}]*\s*}/g, '');

    //recupera classes no que restou
    return semBlocos.match(/\.-?[_a-zA-Z]+[_a-zA-Z0-9-]*/g)

}

Example of use:

var classes = parseClasses(str);
for (var i = 0; i < classes.length; i++) {
    console.log(classes[i])
}

Demo no jsfiddle

  • 2

    Although the proposal is interesting, nothing replaces a full parse. For example, a comment may contain quotes in it (/* " */) and a string (for example, in content) may contain an asterisk in it (" /* "). Anyone you try to replace first will break each other (unless you combine them into a single regex).

  • 1

    @mgibsonbr Well remembered. Regex is not solution for everything.

0

We can parse string character by character to delete classes that are inside comments /* .classe */, is the best way to extract all CSS classes from a string.

$string = <<<EOF
div.classe1{/*comentario div.classe1b*/}
div.-classe2{/*div.-classe2b*/}
div._classe3{/*div._classe3b*/}
.classe4 div a{/*.classe4b div a*/}
.classe5.classe6{/*.classe5b.classe6b*/}
.classe7{/*......classe7b......*/}
.classe8{esse aqui nao tem comentarios mas tambem nao pega o .classe8b pois esta dentro das chaves}
.cl{/*.clb 2 caracteres*/}
.c{/*.cb 1 caractere*/}
.d{/*.db 1 caractere*/}
EOF;

$length = strlen( $string );

$brackets = false;
$comment = false;
$dot = false;
$class = '';
$classes = array();

for ( $i = 0, $j = 0; $i < $length; $i++ ) {
  if ( $string[ $i ] === "\x2f" && $string[ $i + 1 ] === "\x2a" ) {
    $comment = true;
    continue;
  } else if ( $string[ $i ] === "\x7b" ) {
    $brackets = true;
    continue;
  } else if ( $brackets === false && $comment === false && $string[ $i ] === "\x2e" ) {
    $dot = true;
    continue;
  } else if ( $string[ $i ] === "\x2a" && $string[ $i + 1 ] === "\x2f" ) {
    $comment = false;
    continue;
  } else if ( $brackets === true && $string[ $i ] === "\x7d" ) {
    $brackets = false;
    continue;
  }
  if ( $dot ) {
    $j = $i + 1;
    $k = $j;
    if ( ( ( $string[ $i ] >= "\x41" && $string[ $i ] <= "\x5a" ) || ( $string[ $i ] >= "\x61" && $string[ $i ] <= "\x7a" ) || ( $string[ $i ] === "\x2d" ) || ( $string[ $i ] === "\x5f" ) ) === false ) {
      $class = '';
      $dot = false;
      continue;
    }
    $class = $string[ $i ];
    while ( ( $string[ $j ] >= "\x30" && $string[ $j ] <= "\x39" ) || ( $string[ $j ] >= "\x41" && $string[ $j ] <= "\x5a" ) || ( $string[ $j ] >= "\x61" && $string[ $j ] <= "\x7a" ) || ( $string[ $j ] === "\x2d" ) || ( $string[ $j ] === "\x5f" ) ) {
      $class .= $string[ $j ];
      $j++;
    }
    array_push( $classes, $class );
    $class = '';
    $dot = false;
    $i = $j - 1;
  }
}

echo '<pre style="font-size: 14px; font-family: Consolas; line-height: 20px; tab-size: 4;">';
var_export( $classes );
echo '</pre>';
die();

The result obtained from the function var_export is the array with all classes, and as the goal is to remove classes that will be present within comments, then this is achieved successfully and in addition classes that exist for whatever reason are removed within keys (maybe this does not work well with @media css), but I added this last because I’m assuming that your code is a "normal" code without @media css, if there is a @media simply remove the parts relating to brackets, in the original code I drew up did not have the brackets, I just added last.

I wrote the code now after reading the question, did quick tests so I’m not sure it’s working 100%, but it’s extracting classes that start with - or _ or a-z or A-Z and whether or not - or _ or a-z or A-Z or 0-9.

array (
  0 => 'classe1',
  1 => '-classe2',
  2 => '_classe3',
  3 => 'classe4',
  4 => 'classe5',
  5 => 'classe6',
  6 => 'classe7',
  7 => 'classe8',
  8 => 'cl',
  9 => 'c',
  10 => 'd',
)

P.S.: I know the category is Javascript and the published code is PHP, but I made a point of answering because it is the correct answer to your question and the one that gets the best results, and also because of the similarity in the syntax and functions PHP and JS, to "convert" for Javascript, only minimal adaptations will be required. I hope I’ve helped.

0

Using regular expressions, /([^{]+)\s*\{[^}]*}/ satisfies all selectors and /\.(-?[a-z_]+[a-z0-9-_]*)/ all classes. Example:

var style = ".class1 { text-align: left; font-weight: bold} " +
            "/* -- Comment -- */" +
            ".class2.class3 { border: 1px solid #a1a1a1 } " +
            "textarea.class4 { height: 400px } " +
            ".class5 ~ .class6 { display: none } " +
            ".class7 .class8:first-child { color: #ababab } ";
function parseCss(stylesheet) {
    var stylesheetPattern = /([^{]+)\s*\{[^}]*}/gm,
        selectorPattern = /\.(-?[a-z_]+[a-z0-9-_]*)/ig,
        classes = new Array(),
        selector, match;
    stylesheet = stylesheet.replace(/(\/\*.*(?!=\*\/)\*\/)/gm, "");
    while(match = stylesheetPattern.exec(stylesheet)) {
        while(selector = selectorPattern.exec(match[1])) {
            classes.push(selector[1]);
        }
    }
    return classes;
}
parseCss(style);

In this case, parseCss() returns [ "class1", "class2", "class3", "class4", "class5", "class6", "class7", "class8" ].

Browser other questions tagged

You are not signed in. Login or sign up in order to post.