How to capture repetitions of a given group with regex?

Asked

Viewed 709 times

2

I’m trying to capture a certain structured data, and I need it to capture a certain group as long as there are possibilities.

The data format is as follows:

foo01@chave1|valor1#chaveN|valorN

Where the first group is composed of a numerical alpha value, separating the other groups with the character @, my interest is to obtain only the foo01 in the first group.

The second group is what is repeated, where Attribute and value are separated by | and the other attributes are separated by #, after that there is no more information only attributes in the format "Chave1|value1#chaveN|valueN".

Below is possible to see what I started to do, but I could not capture all attributes separately.

const regex = /(^[a-zA-Z0-9_]*@)([a-zA-Z0-9_]*\|[a-zA-Z0-9_]*)/g;
const str = `foo01@chave1|valor1#chaveN|valorN`;
let m;

while ((m = regex.exec(str)) !== null) {
  if (m.index === regex.lastIndex) {
    regex.lastIndex++;
  }

  m.forEach((match, groupIndex) => {
    console.log(`Encontrado, grupo ${groupIndex}: ${match}`);
  });
}
Note: It is only in js for the ease of adding the snippet

I would like to know how to capture all occurrences of "Chave1|value1#chaveN|valueN" inside a string independent of the quantity present in the string.

  • 2

    In case, do you need to capture all attributes separately independent of the quantity present in the string? I got it right?

  • 1

    Exactly, I need the group ([a-zA-Z0-9_]*\|[a-zA-Z0-9_]*) be captured n times until the end. I’m worried about performance here, I could do it using interaction, but I believe regular expression will be much more performatic.

  • 1

    I’m not sure if regex supports something like this. I remember once searching for a similar solution and finding nothing. I had to call the JS anyway.

  • Good question +1

3 answers

3


How to capture repetitions of a given group with regex?

There are 2 alternatives:

  • Use the resource match Previously named capture group (find the previously named group), with it you can designate a name for a capture group and repeat its capture several times using quantifiers Greedy, Lazy, possessive, etc..

  • Create 2 capture groups, the first to capture the sequence you want and the other on the outside, encompassing only the capture group and a quantifier, in your case would use a greedy quantifier (Greedy)

Answer 1

(?'foo'\w*)@(?'Todos_Atrib_Val'(?'Atrib_Val'\w*\|\w*#{0,1})(?'Atrib_Val_Recursivo'\g'Atrib_Val')*)

I understand that it is desperate to see such a large Regex code, but it is much easier to read once placed on the regex101 site or isolating its named capture groups and analyzing 1 to 1, it becomes much easier to maintain code and read by other programmers.
Here you can see this regex in action, I recommend that you look at the "Match Information" panel and note how the line of thought is organized.

Answer 2

(\w*)@((\w*\|\w*#{0,1})*)

Here is the same line of thinking, but without naming groups and without using the catch resource from previous groups.

  • Group 1 to capture the sequence before the @
  • Group 2 that will capture all group 3 find, storing all results greedily.
  • Group 3 that identifies the sequence and captures Chave1|value1# N times, but only stores the last.

You can check that its working is equal to the first example here.

2

You need to capture everything in a single regular expression?

I believe I got something close to what you need here with this one. I left @ and # out:

const regex = /(^[\d\w_]*)?[@|#]([\d\w_]*\|[\d\w_]*)/g

But I believe that in this case a split() would be much simpler to understand everything.

var campos = str.split('@');
var inicio = campos[0];
var lista = campos[1].split('#');

console.log(inicio, lista);

2

The way I know it is with split()...

var arrayStr = str.split(/(@|#|\|)/)

This creates an array variable that will have the elements separated by @, # or |

And to catch:

arrayStr[0]
arrayStr[1]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.