How to Split Javascript using multiple tabs and without removing them?

Asked

Viewed 622 times

1

I have the following Javascript code:

 if (v.includes("'!'"))
                var separators = ["\\\''", '\\\&', '\\\#', '\\\|'];
            else
                var separators = ["\\\''", '\\\'', '\\\&', '\\\#', '\\\|'];
            var variables = v.split(new RegExp(separators.join('|'), 'g'));

Where variables would be an array with my split, need a split using multiple separators and without them being removed from the resulting array.

Ex:

"Teste1 & Teste2 # Teste3"  -> 
1:Teste1 
2:& 
3:Teste3 
4:# 
5:Teste3
  • your regex needs to be improved to accept terms, you are throwing everything into it without setting start or end or condition, example /(thermo1)(thermo2)(thermo3/$g, type this and such

  • I don’t quite understand what you want to do... Try [Dit] your question so as to add more details.

  • 'Teste1 & Teste2 # Teste3'.split(' ') doesn’t answer what you need?

2 answers

1

Use capture group so that the regex characters in the split are included in the resulting array. Just include regex by concatenating between parentheses ().

For example, the regex /,/ in the "a,b".split(/,/) will result in array ["a", "b"].

console.log("a,b".split(/,/));

Already using capture group (,), the comma is also included in the array, resulting in ["a", ",", "b"]:

console.log("a,b".split(/(,)/));

So, just concatenate the parentheses into the pattern of RegExp:

v.split(new RegExp('('+separators.join('|')+')', 'g'))
                    ↑                        ↑

Behold:

var v = "Teste1 & Teste2 # Teste3";
var separators = ["\\\''", '\\\'', '\\\&', '\\\#', '\\\|'];
var variables = v.split(new RegExp('('+separators.join('|')+')', 'g')).map(function(i){
   return i.trim();
});
console.log(variables);

See that I used a .map() additional with .trim() to remove the spaces.

1

One thing is not clear (as already indicated in the comments): in your example, the spaces are removed from the final result. So much so that, to get exactly what was indicated, it would be enough to make a split through spaces:

console.log("Teste1 & Teste2 # Teste3".split(' '));


Assuming this whole space thing was a little mistake, and you actually want to make a split and include the separators in the final result, so just use a catch group, as indicated in another answer. This works because according to the documentation, when regex has capture groups, these are also included in the result of split.

Now, a small improvement suggestion: if all separators are formed by only one character, you don’t have to do join('|'). Just put all the options in one character class.

For example, if separators can be &, | or #, just use the regex [&|#] (and within brackets, many characters "lose their special powers" and do not need to be escaped with \ - the character |, for example, does not need to be escaped in this case). A regex [&|#] means "the character &, or |, or #" (any of them serves).

Example:

let texto = "Teste1 & Teste2 # Teste3";
let separators = ["'", '\'', '&', '#', '|'];
let partes = texto.split(new RegExp('(['+ separators.join('') + '])'));
console.log(partes);

Notice that I put together all the tabs, put between [] to form the character class, and I place all this in parentheses to form the capture group. Another detail is that you do not need to pass the flag g for the builder of RegExp - this flag influences search and substitution (in methods such as match, exec and replace), but in the split makes no difference.

One detail is that the above solution includes the spaces in the final result. It was not clear whether the spaces should be removed or not (the description of the question implies that they should not, but the given example suggests they should be). If you have to remove the spaces, just use map along with trim, as the other reply already indicated.

Recalling again that the use of [] only works when separators contain only one character. If they can have more than one character, then the way is to use | as you were already doing (with the detail that not all characters need to be escaped with \, like the & and the #, for example).

Another detail (probably nothing more than micro-optimization) is that the character class is faster: compare here and here the number of steps of each option. Obviously, for a few small strings, the difference will be insignificant, but anyway, the difference between the two options is recorded.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.