Regular Expression to separate Substring from String

Asked

Viewed 542 times

2

I have the following sentence: "SUZANO ZANO ZMES ZDIA ZANO_MES" and I need to remove from it only "ZANO ZMES ZDIA", recalling that the ZANO of SUZANO and of ZANO_MES cannot be related.

I used the following expression and still the results are not ideal : /[^A-Z]Z[A-Z0-9^_]+/

I managed to improve the expression a little in this way : /^A-Za-z ]Z[A-Z0-9]+[^_ ]/ but I still don’t know if it’s the right way.

The rule is to take everything that starts with "Z", not preceded by letters, numbers and special characters and not containing the "_".

  • 2

    Which rule defines which words will be removed? Will these always be? Or only all exceptions to the first and last? Or what?

1 answer

2


If you want everything that starts with "Z," an alternative is:

$texto = "SUZANO ZANO ZMES ZDIA ZANO_MES";
if (preg_match_all('/\bZ[a-zA-Z0-9]+\b/', $texto, $resultados)) {
    foreach($resultados[0] as $str) {
        echo "$str\n";
    }
}

The shortcut \b means "word frontier", and ensures that before the Z there is no alphanumeric character. Thus, regex will only take words beginning with "Z".

Then it wasn’t clear what you might have in the word, but I used [a-zA-Z0-9]+ (one or more letters or numbers), followed by another \b. The result is an array of pouch, and going through it, you have found the words:

ZANO
ZMES
ZDIA

Remember that the above regex only considers words that start with the capital Z letter. If you want to consider both upper and lower case, you can add the flag i, which makes the regex case insensitive:

$texto = "SUZANO ZANO ZMES zDia ZANO_MES";
if (preg_match_all('/\bZ[A-Z0-9]+\b/i', $texto, $resultados)) {
    foreach($resultados[0] as $str) {
        echo "$str\n";
    }
}

As now the regex is case insensitive, I could use only [A-Z0-9] (it is not necessary to put the interval a-z in this case). Now she picks up words that start with "Z" or "z":

ZANO
ZMES
zDia

Why didn’t your regex work

Your first attempt uses [^A-Z] (any character other than a letter of A to Z). The idea was good, but the problem is that this expression corresponds to a character, and this character will be part of the match. In the case of your string, the space that exists before Z will also be taken by regex.

Then you used [A-Z0-9^_]. The character ^ inside brackets only negates the characters if it is just after the [ (as it is in the middle, so it corresponds to the character itself ^). So this regex is not excluding the _, and yes including it.

But you don’t really need it, because the brackets are already restrictive on their own, so if you just [A-Z0-9], that would already rule out the _ (in fact, it already excludes any character other than letter or number).

Your second attempt also excludes spaces before the "Z", which means you won’t find any occurrence (see).

  • 1

    It was perfect, I did not know the use of b. Thanks for the support !!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.