Delete letters before after a number

Asked

Viewed 231 times

2

Hello, I have a string (var test), and would like to perform some operations on it. The desired result is a number, preceded by or of the letter p, s, or ps (NAY sp) (ex: ps1), and a number followed by any letter of a-z, in any quantity, but without repeating the same letters (ex: p1abd, would correspond, BUT p1abbd not because there was repetition of one of the letters, the letter b. (That part of avoiding repetition could not perform). It seemed ideal to use test[i].match(/\b(p|s|ps)\d[a-z]*\b/), however, as you can see in the 1st Example, the array comes with 2 values, the second being the letter(s) before the number, undesirable, I just want a value, the first. I think this has something to do with parentheses, but I couldn’t get another combination that worked. In the 2nd example, it appears exactly as I want, but the regex is wrong because it relates any combination of the letters p and s, but does not list the letters ps. Already in the 3rd Example I want to eliminate everything except the letters after the number, but I had problems probably because of the Array c/ two values. And in the 4th Example I want to delete everything except the number between the letters. In the case of the 3rd and 4th Examples, I know that there are simpler ways to accomplish this, such as: test[i].match(/\d/g).toString(), to display only the number. But I would like to know, for learning purposes, how to isolate the number of the pattern to be deleted, as I did in the 3rd Example. I tried something like: ...replace(/[p|s|ps][^\d][a-z]*/, '')), but it didn’t work.

var test = 'xyz p1abc xyz; xyz s3de xyz; xyz ps2fgh xyz'; // p1abc, s3de, ps2fgh
test = test.split(';');

for (var i = 0; i < test.length; i++) {
test[i] = test[i].replace(/^\s+|\s+$/g, '');

// Exibir as letras 'p', 's', ou 'ps' ANTES do nº, e qualquer letra APÓS o nº em qualquer quantidade.
console.log(test[i].match(/\b(p|s|ps)\d[a-z]*\b/)); // 1º Exemplo
// Resultado:
Array [ "p1abc", "p" ] // repete a letra p
Array [ "s3de", "s" ] // repete a letra s
Array [ "ps2fgh", "ps" ] // repete as letras ps

console.log(test[i].match(/\b[p|s|ps]\d[a-z]*\b/)); // 2º Exemplo
// Só não funciona porque não inclui o 'ps'. Resultado:
Array [ "p1abc" ]
Array [ "s3de" ]
null

// Exibir só as letras APÓS o número. ([a-z]*) // 3º Exemplo
console.log(test[i].match(/\b[p|s|ps]\d[a-z]*\b/).toString().replace(/[p|s|ps]\d[^a-z]*/, ''));
// Novamente, só não funciona porque não inclui o 'ps'. Resultado:
TypeError: test[i].match(...) is null
"abc"
"de"

// Exibir só o número. (\d) // 4º Exemplo
console.log(test[i].match(/\b[p|s|ps]\d[a-z]*\b/).toString().replace(/[p|s|ps]\d[a-z]*/, ''));
// Para este não achei solução.    
}

1 answer

3


You need to understand how the capture groups:

  • The first value returned is always the marriage whole (There’s no changing that);
  • For each (...) in regex, an additional value is returned, in consecutive indexes, corresponding to that part of the regex; may be empty (e.g..: (a?)b applied to b will come with the empty group).
  • If you do not want that a group is catching, use (?:...).

Example:

\b(p|s|ps)(\d)([a-z]*)\b

Three capture groups. Some possible results:

p1abc ==> "p1abc", "p", "1", "abc"
ps2   ==> "ps2", "ps", "2", ""

Another example:

\b(?:p|s|ps)\d[a-z]*\b

No capture group:

p1abc ==> "p1abc"
ps2   ==> "ps2"

In these three examples, I will capture only part of the string, and not the others:

\b(p|s|ps)\d[a-z]*\b
    p1abc ==> "p1abc", "p"
    ps2   ==> "ps2", "ps"

\b(?:p|s|ps)(\d)[a-z]*\b
    p1abc ==> "p1abc", "1"
    ps2   ==> "ps2", "2"

\b(?:p|s|ps)\d([a-z]*)\b
    p1abc ==> "p1abc", "abc"
    ps2   ==> "ps2", ""

Substitutions

Once you have established capture groups in your regex, and return them in the method match you can also refer to them and use them during a replacement in the replace. You do it using $n, where n is the group index (starting with 1). For example, assuming the pattern with the three groups, say you want to replace the prefix, digit or suffix with "foo", keeping the rest intact:

"p1abc".replace(/\b(p|s|ps)(\d)([a-z]*)\b/, 'foo$2$3'); // foo1abc
"p1abc".replace(/\b(p|s|ps)(\d)([a-z]*)\b/, '$1foo$3'); // pfooabc
"p1abc".replace(/\b(p|s|ps)(\d)([a-z]*)\b/, '$1$2foo'); // p1foo

Remarks

  • If you really can’t use capture groups, and only need to marry one part of the string, read on lookarounds. Those regexes, for example, box only the prefix, only the number and only the suffix:

    \b(?:p|s|ps)(?=\d[a-z]*\b)
    (?<=\bp|\bs|\bps)\d(?=[a-z]*\b)
    (?<=\bp\d|\bs\d|\bps\d)[a-z]*\b
    

    But if you can avoid this, avoid, catch groups are much simpler to understand and apply (and have fewer restrictions than catch groups lookarounds - for example, Javascript does not support lookbehinds, and the vast majority of languages only accept lookbehinds fixed-size).

  • Your attempt [p|s|ps] did not work because the brackets match only one character, among those listed. Your example would marry p, s or |!

  • If you don’t want the letters to repeat themselves at the end, I suggest doing it otherwise regex. Theoretically is possible (it is a regular language), but in practice the number of states would be equal to the number of possible combinations (because the regex would need to "remember" the lyrics that have already appeared so as not to allow them to appear again). The performance would probably be very bad...

    But... it’s not impractical at all! That answer on Soen shows a way to match a string without repetition, combining a capture group, a backreference (reference for a group already captured) and a Lookahead negative:

    ^(?:([A-Za-z])(?!.*\1))*$
    

    Adapted to your case, with all capture groups (can not get rid of the last, at least, so I kicked the bucket and included all...), would be like this:

    \b(p|s|ps)(\d)((?:([a-z])(?![a-z]*\4))*)\b
    

    See a example in the ruble: the married section is highlighted, and the capture groups 1 to 3 show the prefix, digit and suffix (group 4 is useless).

  • Is it really necessary to use regex for that? If you identified the pattern, took your parts with capture groups, then you can replace/match them the way you want by simply concatenating those parts. Anyway, I updated the answer with a note about substitutions. In short, you keep the prefix with replace(/\b(p|s|ps)(\d)([a-z]*)\b/, '$1') and the digit with replace(/\b(p|s|ps)(\d)([a-z]*)\b/, '$2'), for example (there are other ways).

  • @mgibsonbr, I examined your explanation well and everything was clear p/me. I stored the result in var s = test[i].match(\b(p|s|ps)(\d)((?:([a-z])(?!.*\4))*)\b);, and now access each capture group by its contents. Ex: s[0] and/or s[2]..., economic good. Thanks for the help! However, when the value of the variable test repeats even a single letter of the alphabetic suffix (3rd capture group), no match. Ex: var test = 'xyz p1abc cde does not match because of the letter c in Cde, even though we are outside the limits of the search.

  • In fact, Lookahead checks to the end... Try replacing the . on the other hand [a-z], so he should stop looking for repetitions where he shouldn’t: \b(p|s|ps)(\d)((?:([a-z])(?![a-z]*\4))*)\b (example)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.