How to ignore elements escaped in a rule in regular expression?

Asked

Viewed 1,202 times

4

I’m wanting to do with regex (regular expression), for example (if it’s javascript):

var str = '[abc\[0123\]] [efg\[987\]h] [olá \[mundo\]!] [foo [baz]]';
str.match(/\[(.*?)\]/g);

Exit: ["[abc[0123]", "[efg[987]h", "[olá [mundo]!", "[foo [baz]"]

Or

var str = '{abc\{0123\}} {efg\{987\}h} {olá \{mundo\}!} {foo {baz}}';
str.match(/\{(.*?)\}/g);

Exit: ["{abc{0123}", "{efg{987}", "{olá {mundo}", "{foo {baz}"]

But I would like the first to ignore places not escaped as [foo [baz]] and just take the [baz] and the escapees, like this:

 ["[abc[0123]]", "[efg[987]h]", "[olá [mundo]!]", "[baz]"]

And in the second it returns:

 {"{abc{0123}}", "{efg{987}h}", "{olá {mundo}!}", "{baz}"]

My intention initially is for study, but I also intend to use in things like a structure that is similar to CSS selectors, so for example:

  input[name=\[0\]], input[name=foo\[baz\]\[bar\]]

It would return this:

  [0], [1]

Or a map of Urls I intend to create:

  /{nome}/{foo\{bar}/{baz\{foo\}}/

And return this:

 {nome}, {foo{bar}, {baz{foo}}

What I want is to ignore the escaped characters, how can I do this? Can provide an example in any language, the most important is Regex

  • 1

    To ignore spaced characters you can use: [^\\]+. You want the return of this regex to be N Groups?

  • @Gabrielgonçalves works well, but he is accepting the not escaped too, for example var str = '{abc{0123}}'; /\{([^\\]+)\}/.exec(str);

1 answer

6


You need to make the content to be married consume both the backslash and the subsequent character as if it were one thing:

\\.|.

That is, it houses a backslash followed by anything (2 characters), and only if the first is not a backslash it matches a single character.

As for the last example (where you only want the innermost bracket), you can achieve this in this particular case (but not in general, because balancing brackets/brackets/keys does not constitute a regular language) requiring that the married content not contain a clasp opener, unless escaped:

\\.|[^\[]

The full regex would therefore be so:

\[((?:\\.|[^\[])*?)\]
\{((?:\\.|[^{])*?)\}

Example:

var str = '[abc\\[0123\\]] [efg\\[987\\]h] [olá \\[mundo\\]!] [foo [baz]]';
var regex = /\[((?:\\.|[^\[])*?)\]/g;
   
document.getElementById("saida").innerHTML += "<pre>" + str.match(regex) + "</pre><br/>"

var str = '{abc\\{0123\\}} {efg\\{987\\}h} {olá \\{mundo\\}!} {foo {baz}}';
var regex = /\{((?:\\.|[^{])*?)\}/g;

document.getElementById("saida").innerHTML += "<pre>" + str.match(regex) + "</pre><br/>"
<div id="saida"></div>

Notes:

  1. In the example, I needed to use two \ in the string because otherwise the backslash would not be considered an escape character.

  2. The output includes the bars; if you want to remove them, you would need to process the output of the match using maybe a replace:

    str.match(regex).replace(/\\([\[\]{}])/g, "$1");
    
  3. The ?: was placed so that the parenthesis does not become a capture group. If you are not using groups, it may be omitted.

  • Is it me or is it returning [foo [baz] also

  • @Guilhermenascimento See note #3. It is impossible for a regex to return [baz].

  • The pardon, I understand now, I will take a look at the source in jQuery1.x (Sizzle.js) that seems to do this

  • @Guilhermenascimento I actually take back what I said - it is possible to adapt the regex to return [baz], simply require that the content does not contain any [ not escaped. That is, the second . has to be replaced by [^\[]. I’ll update the answer.

  • I just have to thank, it worked very well, it was a question my old one, I remember that the Sizzle/ jQuery (at the time we did not have document.querySelector) was the only one who had ability to escape selector instructions for example $('div[data-foo="foo \"bar\" baz"]'), this your response to my view is one of the best of Sopt unfortunately few understand even regex and its usefulness or advantages.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.