Second backslash in metacharacters when the expression is in quotes

Asked

Viewed 371 times

5

When using a pattern between quotation marks a second backslash must be added (\) in the meta-characters that have such a bar, as in this case, for example:

/\d+/ -> "\\d+"

Code examples:

var str = "Hello 123!";

// usando new RegExp()
var re = new RegExp("\\d+"); // entre aspas
var re2 = new RegExp(/\d+/); // sem aspas

// sem new RegExp()
var re3 = "\\d+";            // entre aspas
var re4 = /\d+/;             // sem aspas

console.log(str.match(re));
console.log(str.match(re2));
console.log(str.match(re3));
console.log(str.match(re4));

From what I read in this documentation, the point (.) is the only meta-character that has no backslash, unlike \d, \w, \s etc..

What would be the logical explanation for the need for the second backslash in meta-characters (except .) when the pattern is used in quotes?

  • 2

    That basically occurs as the JS uses the backslash to escape/some characters. \b => , \f => , \u => (converte unicode), \x => (converte hex), \t => tab, \n => new line etc. The regex in quotation marks, the JS you will have to escape all these characters only to have the correct expression and process it.

  • More or less related: https://answall.com/q/316714/132

1 answer

4


Within strings, the \ is used to encode escape sequences. That is, it is used to encode things that would be difficult to place inside the string in some other way. For this we have the \n denoting a line break, the \t denoting a tab, the sequences \u1234 to encode specific Union characters and some other cases. All these cases are solved at compile time (although Avascript is interpreted, it does a compilation just-in-time before starting to interpret the code). Thus, in the string "Bom\ndia" what the compiler will mount will be a string with Bom, a line break and dia.

However, once the character \ is used to escape other sequences, so how would it be to put the character itself \ in the string? The answer is with the escape sequence \\. That’s why inside strings, when you want to write the character \, he has to be bent.

Similar cases occur with the ' and with the ", that because they are string terminators, they can be represented as \' and \", respectively when they are inside the string.

So far we don’t talk about regular expressions, but the thing complicates because regular expressions also use the character \ to make escape, and uses it to make sequences of escapes that are largely different from those made with strings (although to make the escape of itself \, regular expressions also use \\). So if you have a string that appears in the code as "\\d+", the compiler just-in-time Javascript will mount in memory a string with the contents \d+ and then the regular expression compiler will compile this regular expression for an object that accepts one or more characters in the range '0' to '9'.

The point (.) is a character that has nothing special about strings, so it does not need to be represented by any escape sequence. However, in regular expressions, it has special meaning by being able to represent any character. Thus, new RegExp("abc.def") will become a regular expression that recognizes 7 characters, the first three being abc, the last three def and the middle, anything.

And how then to represent the character . in a literal regular expression? In this case, the regular expression \.. Only if it’s encoded as a string, you’ll have to use "\\.".

What happens is that when you represent regular expressions with strings, there are two build steps involved. One to mount the string in memory, applying the necessary escape sequences and a second to convert the string into a regular expression, also applying the necessary escape sequences. This means that in this case, the programmer has to be attentive to see what is being mounted when, which can be very confusing considering that the two steps use the same character \ to represent exhaust sequences.

However, when you use regular expressions delimited by /, as in /\d+/ or in /\./, in this case you are not constructing a string, but instructing the compiler to construct the regular expression directly without the intermediate step of representing it as a string. That’s why in this case \d and the \. should not be represented as \\d or \\..

Ah, and that’s why new RegExp("\\\\") is the regular expression that is used to recognize a single \ singular. The string to be mounted will be \\, which interpreting as regular expression, becomes only \.

  • 1

    I was reading here a doc de regex in Phyton that speaks just that, but I was having trouble understanding, even more in English :/... your answer clarified everything. It was clear that neither water and easy to read. Obg!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.