How to validate a regex rule for a user quote?

Asked

Viewed 147 times

1

I’m making a feature on my site to quote users using @:

@bananinha disse aquilo

In case the name @bananinha will turn into a link when the person submits the comment. I can do with a lot of if but I saw that with regex the code is cleaner.

In this case I’m doing with regex but I’m not getting to take all the cases.

The rules you need to abide by are:

  • The first character must be @
  • The rest of the characters must be A-Z, a-z, . , or , _. And must contain at least one character.
  • The character @ should appear only once.

What I have so far is that rule:

function validNickname(str) {
    var pattern = new RegExp('[@][A-Za-z0-9._]','i');
    return !!pattern.test(str);
  }


  function getMention(str) {
    var words = str.split(' ');

    for (var i = 0 ; i < words.length ; i++) {
      console.log(words[i], validNickname(words[i]));
    }
  }
  
  t = '@ @@@@ @bananinha oloco@meu cachorrinho@ @nick_fury @pik4chu @$$money_ @estou_entre_arrobas@ @estou_entre_arrobas@eu_tambem@'
  
  getMention(t);

But there are some cases I’m not able to validate, as the first and only occurrence of @.

1 answer

4


Just change to:

function validNickname(str) {
    let pattern = /^@[A-Za-z0-9._]+$/;
    return pattern.test(str);
}

function getMention(str) {
    str.split(' ')
       .forEach(word => console.log(word, validNickname(word)));
}
  
let t = '@ @@@@ @bananinha oloco@meu cachorrinho@ @nick_fury @pik4chu @$$money_ @estou_entre_arrobas@ @estou_entre_arrobas@eu_tambem@';
  
getMention(t);

The markers ^ and $ are respectively the beginning and end of the string. Thus you ensure that regex will check all the contents of the string, and make it clear that the @ can only be at the beginning.

The excerpt [A-Za-z0-9._] is a character class, which corresponds to any letter of A to Z (upper or lower case), digits from 0 to 9, dot or _. The detail is that all this expression corresponds to only one character. That is, its regex only checked if it had a character after the @, disregarding the rest of the string.

To consider more than one character, I used the quantifier + (one or more occurrences), to indicate that the letters, point and _ can be repeated several times. And the $ ensures that you can only have these characters until the end of the string.

Note that the flag i is not required. It indicates that regex should be case insensitive, but as you have used A-Za-z, regex will consider both upper and lower case letters.

I also switched [@] by just @, since brackets make no difference in this case (both expressions are equivalent).

Finally, the method test already returns a boolean, you don’t have to do !! on the return of the method.


Another option is to use:

let pattern = /^@[\w.]+$/;
  
let t = '@ @@@@ @bananinha oloco@meu cachorrinho@ @nick_fury @pik4chu @$$money_ @estou_entre_arrobas@ @estou_entre_arrobas@eu_tambem@';
  
t.split(' ').forEach(word => console.log(word, pattern.test(word)));

For the shortcut \w already consider the letters, numbers or the character _.


Remember that these regex are "naive" because they accept strings like @_._..., for example (for all characters contained in [A-Za-z0-9._] may be repeated several times - see):

let pattern = /^@[\w.]+$/;
  
let t = '@__._... @___abc.123....';
  
t.split(' ').forEach(word => console.log(word, pattern.test(word)));

If you want to be more specific, regex gets a little more complicated. For example, if the criterion is which point and the _ cannot appear more than once in a row (@abc.123 can, but @abc..123 and @abc._123 no), an option would be:

let pattern = /^@[A-Za-z0-9]+([._][A-Za-z0-9]+)*$/;
  
let t = '@__._... @___abc.123.... @abc.123_xyz @abc._123';
  
t.split(' ').forEach(word => console.log(word, pattern.test(word)));

Note that now the character class [A-Za-z0-9] only has letters and numbers. Then there is an excerpt containing [._] (a point or _) followed by one or more letters/digits. This final chunk is grouped in parentheses, and the quantifier * indicates that it can repeat itself zero or more times. That is, the sequence "point or _, followed by letters/digits" can be repeated several times (for cases like @abc.123_xyz), or none (for cases such as @abc123).

Anyway, regex is like this: the more specific and the more cases you want to treat, the more complex it becomes. It’s up to you to choose how far to go (if you know you’ll never have cases like @abc....123, for example, you can use the same first version).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.