Regex in input Pattern is not valid, but the expression is functional

Asked

Viewed 354 times

3

I have a field on a form that can only accept Youtube or Vimeo Urls. Having this, I found the following regex:

(?:(?i)(?:https:|http:)?\/\/)?(?:(?i)(?:www\.youtube\.com\/(?:embed\/|watch\?v=)|youtu\.be\/|youtube\.googleapis\.com\/v\/)(?<YoutubeID>[a-z0-9-_]{11,12})|(?:vimeo\.com\/|player\.vimeo\.com\/video\/)(?<VimeoID>[0-9]+))

Tested here: https://regex101.com/r/PVdjg0/2

However, when adding this REGEX to the input via attribute pattern, it is accepting any URL.

<html>
    <head>
        <meta charset="utf-8">
    </head>
    <body>
        <form>
            <input type='url' required pattern='(?:(?i)(?:https:|http:)?\/\/)?(?:(?i)(?:www\.youtube\.com\/(?:embed\/|watch\?v=)|youtu\.be\/|youtube\.googleapis\.com\/v\/)(?<YoutubeID>[a-z0-9-_]{11,12})|(?:vimeo\.com\/|player\.vimeo\.com\/video\/)(?<VimeoID>[0-9]+))' title='URL Vimeo/Youtube.' name='video' placeholder='Video URL' />
            <input type='submit'>
        </form>
    </body>
</html>

What is going on? What was done incorrectly?

1 answer

3


The problem with regular expressions is that they don’t work exactly the same in all languages/Engines/API’s, and a feature that exists in one will not necessarily be supported in another.

In the case of Javascript, it does not support inline flags, like the (?i) (which serves to render the expression case insensitive, that is, do not differentiate between upper and lower case letters). And the documentation says that the expression must be a valid regex in Javascript, and when the expression is invalid, the attribute pattern is ignored.

Even in regex101.com, in the menu on the left there is the option FLAVOR for you to choose the engine regex. And if you choose "Ecmascript (Javascript)", you will see that the actual expression is invalid, because of the (?i).

If you remove this flag, the expression works. But it ceases to be case insensitive, then you won’t accept things like "WWW.youTubE.Com", for example:

<form>
  <input id="campo" type="text" pattern="(?:(?:https?:)?\/\/)?(?:(?:www\.youtube\.com\/(?:embed\/|watch\?v=)|youtu\.be\/|youtube\.googleapis\.com\/v\/)(?<YoutubeID>[a-z0-9-_]{11,12})|(?:vimeo\.com\/|player\.vimeo\.com\/video\/)(?<VimeoID>[0-9]+))" required />
  <input type="submit" value="ok">
</form>

I changed the excerpt (?:https:|http:) for (?:https?:) (the s? indicates that the letter "s" is optional).


If you want to validate case insensitive, The way is to do outside the input:

let regex = /^((?:(?:https?:)?\/\/)?(?:(?:www\.youtube\.com\/(?:embed\/|watch\?v=)|youtu\.be\/|youtube\.googleapis\.com\/v\/)(?<YoutubeID>[a-z0-9-_]{11,12})|(?:vimeo\.com\/|player\.vimeo\.com\/video\/)(?<VimeoID>[0-9]+)))$/i;
document.querySelector('#form').addEventListener('submit', function(event) {
    let url = document.querySelector('#campo').value;
    if (regex.test(url)) return true;

    alert('URL inválida');
    event.preventDefault();
    return false;
});
<form id="form">
  <input id="campo" type="text" required />
  <input type="submit" value="ok">
</form>

Note that I created regex using /etc.../i - the "i" at the end indicates that she is case insensitive. And I also included the markers ^ and $, indicating the start and end of the string, to ensure that the field only has what is specified by regex (in the attribute pattern it is not necessary, therefore for default he already considers that the field can only have what is in the regex).

Unfortunately there is no practical way to do it on your own pattern of input. A "solution" (which is so horrible, I consider a horrendous gambiarra, don’t use), would be instead of youtube, place [Yy][Oo][Uu][Tt][uU][Bb][Ee] (and do it for all the letters you want case insensitive). But it is - in my opinion - too bad, I prefer the second solution suggested above, doing the validation in Javascript itself, outside the pattern.

But if you want only the video ID is case insensitive, just change to (?<YoutubeID>[a-zA-Z0-9-_]{11,12}) - adding A-Z, she also considers capital letters.

  • Thank you very much for the explanation! Precisely, I thought that regex was "universal", and so the inconsistencies! Unfortunately, I need to be case insensitive, because the Youtube video parameters contain several alphanumerics that can be higher/smaller. I will analyze your solution and adapt my reality here!

  • 1

    @Maykelesser As I said, no pattern has no way. Unless you do something horrible: for each letter, put the upper and lower case option, ie instead of youtube, would be [Yy][Oo][Uu][Tt][uU][Bb][Ee] - anyway, a very "disgusting" solution, but that "works" (in this case, I prefer to do in Javascript itself, as explained above) - however, I updated the answer with this horrible option :-)

  • 2

    The problem with regular expressions is that they are not regular hahaha

  • 1

    @Maykelesser If you want only the video ID is case insensitive (and the rest of the URL is not, or www.youtube.com), you can use (?<YoutubeID>[a-zA-Z0-9-_]{11,12}) - updated the answer, put this option at the end. Otherwise, just using the "horrible solution"

  • Yes yes, it would only be the insensitive video ID. Usually the user will copy and paste the URL, so it is enough. I will test here!

  • 1

    @Woss Worse is that they’re not really: https://stackoverflow.com/q/36283119

  • 1

    @hkotsubo worked well with your last adjustment! <3

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.