Problem with regex in php

Asked

Viewed 47 times

0

Hello, in my code I want to separate a string, having as separator a value that should be found via regular expression. Down with what I tried:

    $data = "Amazing.Stories.2020.S01E03.REPACK.720p.WEB.H264-GHOSTS.mkv";

    $pattern = "[#^S\d\dE\d\d$#i]";
    $d = preg_split($pattern, $data);

    echo "<pre>";
    print_r($d);
    echo "</pre>";

In case the separator should be "S01E03", but is not able to find. the result that is giving is :

Array
(
    [0] => Amazing.Stories.2020.S01E03.REPACK.720p.WEB.H264-GHOSTS.mkv
)

The desired result would be:

Array
(
   [0] => Amazing.Stories.2020
   [1] => REPACK.720p.WEB.H264-GHOSTS.mkv
)

I confess that I am bad with regular expressions, but this same Pattern finds when I use it in a preg_grep for example.

Where am I going wrong?

  • I reversed the issue because by changing the question you end up invalidating the answer, and the idea of the site is to have a question by specific problem. If you have another question (even if it is related), please ask another question (not forgetting to search before if there is already something on the site, of course)

  • And if the answer below has solved the split question, you can accept it, see here how and why to do it. It is not mandatory, but it is a good practice of the site, to indicate to future visitors that it solved the problem.

  • thanks again @hkotsubo. Your reply was of great value.

1 answer

1


Change the regex to:

$pattern = '#\.S\d{2}E\d{2}\.#i';

The markers ^ and $ indicate respectively the beginning and end of the string, so it makes no sense to use this in a split, since the pattern will be in the middle of the text. Remove them.

I also removed the clasps, which in this case were being used as delimiters of the expression - i.e., the # was also part of the regex. Thus leaving the # becomes the delimiter.

I also changed the two digits to \d{2} (can still use \d\d), and included the points before and after the text, because I understood that they are not part of the result. Remembering that the point has special meaning in regex (means "any character, except line breaks"), then for it to be interpreted only as the character ., I need to escape it with \.

To flag i indicates that the regex is case insensitive, that is, you will also consider the letters "s" and "e" lowercase. If you only want to consider capital letters, remove the i at the end of the expression.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.