Picking Indentation with Regular Expression

Question

Picking Indentation with Regular Expression

Asked 10 years, 11 months ago

Viewed 226 times

3

I would like someone to help me create an ER that takes only one tag and all lines that are indented in front.

example:

aaa
abab aca
marcacao
   aaa
   abab aca
   cc
cc
bb

With the above code the ER would return:

marcacao
   aaa
   abab aca
   cc

My code is written in Javascript so I use . match() in my Ers.

[edited]

I described my problem better in the comments below.

That’s my real code:

DOCTYPE html
html
    head
        title gulp-gotohead
        style.
            article {border:1px solid}
        style(data-above-the-fold="true").
            body {font-size:100%}
            body{font-size:100%}
            body.main{font-size:100%}
            body, h1{font-size:100%}
            body>h1{font-size:100%}
            header {color:#333}
        script(data-above-the-fold="true").
            var head = Head();
            head();
    body (data-d="true")
        h1 gulp-gotohead
        p
            span regex

This is my appointment:

style(data-above-the-fold="true")

And this is the desired return:

style(data-above-the-fold="true").
     body {font-size:100%}
     body{font-size:100%}
     body.main{font-size:100%}
     body, h1{font-size:100%}
     body>h1{font-size:100%}
     header {color:#333}

The most I can get is to get my dialling codes down http://regex101.com/r/mZ2xS4/1.

How are you capturing this text? from HTML? can you put HTML in for testing? I won’t answer without HTML to be sure, but I imagine it’s something like this: http://regex101.com/r/sH4mZ1/1

– Sergio

2014/08/30 at 07:19
1

Would use regex would not be a XY problem?

– Bacco

2014/08/30 at 13:28
Javascript != HTML, @Sergio. No HTML required to use Javascript and RegExp.

– sergiopereira

2014/08/30 at 22:29
Bacco is right, I will try to explain my real problem. I am developing a plugin for Gulp that automates the task of placing CSS and JS codes inside the HEAD tag of an HTML document. When the source is in pure HTML it is easy to create an ER to identify my markup and exchange the old code for the new one. The problem occurs when the code is written in Jade, where the HTML tags can be very similar to CSS markings and JS scripts. I’m having trouble finding an ER that only picks up my markup ignoring the rest of the file’s source code.

– Belchior Oliveira

2014/08/30 at 22:46
@sergiopereira, yes true, I know. Hence my question to Belchior about "where the content/ text comes from"

– Sergio

2014/08/31 at 06:58
@Belchioroliveira now that you have revealed that you really want to create or consume a specific syntax, I would agree with Bacco and would avoid Regexp in this case. I would suggest ANTLR and ANTLR-Javascript. There is a considerable learning curve with this tool but it exists exactly for this purpose.

– sergiopereira

2014/08/31 at 17:39

Show 1 more comment

3 answers

2

If you really want to use a regular expression, I think the following regex does what you want:

^(\s*)style\(data-above-the-fold="true"\)\.\n(\1\s+.*\n)*

http://regex101.com/r/mZ2xS4/3

The ^(\s*) captures the indentation before its initial marker. Since this is the first capture of regex, we can refer to it as \1. The ^ serves to ensure that we start at the beginning of the line and avoid unnecessary backtracking.

The \1\s+ recognizes a sequence of spaces larger than the sequence before your marker. Be careful not to mix tabs with spaces.

Finally, I added a few \n here and there, since . does not count as line end.

A variation to consider further is to exchange all \s for (space), \t or [ \t], depending on your opinion on mixing tabs with spaces. This would serve to prevent line breaks from being treated as indentation.

Boy, is that right.

– Belchior Oliveira

2014/09/01 at 03:25
I couldn’t think of something like that, I hadn’t really defined in my head this idea of repeating this subset ( 1 s+.\n). That part was killer.

– Belchior Oliveira

2014/09/01 at 03:37

Browser other questions tagged javascript regex

You are not signed in. Login or sign up in order to post.

by sergiopereira • **2,865** points · Answer 1 · 2014-08-30T19:40:12+00:00

It works.

var resultado = seuTexto.match(/^marcacao(\s\s+.+$)+/m )[0];

Edit: This answer was dated before the OP explained the whole problem. Hence the simplicity of Regexp. The problem as currently described above is much more complex and this answer is here as an example.

by hugomg • **8,772** points · Answer 2 · 2014-08-31T18:16:01+00:00

Maybe you can solve your problem with Regex but I believe that the restriction of making the sequin lines have more indentation than the first would make the regex very complicated. I would solve this problem by writing code in the same hand:

In pseudocode

marcação = 'style(data-above-the-fold="true")';

function get_line(){
   pega a próxima linha da entrada e retorna dois valores:
   - o número de espaços no início da linha e 
   - uma string com o resto da linha
}

repeat {
    root_indent, root_data = get_line();
} until(root_data == marcação )

styles = [];
loop {
    indent, style = get_line();
    if (indent <= parent_indent ) { break }
    styles.push(style);
}

One advantage of solving like this instead of using a regex is that the output of the algorithm is a structured list rather than a stringzone and the code written in the hand is more flexible than a regex if you need to change the logic later.