Regex by PHP position

Asked

Viewed 301 times

2

I have the following string:

'QD 10 LT 20 PA 30'

Using the preg_replace() in PHP I can only manipulate numbers.

For example, take the first group of numbers:

$string = 'QD 10 LT 20 PA 30'
preg_replace('/\D+(\d+)\D+(\d+)\D+(\d+)/', '$1', $string)
//return 10

Is there any way to make the pattern recursive?

For example the regex /\D+(\d+)/ separate all numbers into groups according to the number of groups of numbers. That is, if my string was larger ('QD 10 LT 20 PA 30 SC 40'), I wouldn’t need to increase my regex.

1 answer

1


In the question it is mentioned that you want to catch only the first number, but then it says you also want to catch them all. So I left a solution for every case.


Only catch the first number

If you only want to extract a snippet from the string that corresponds to regex, you can use preg_match, passing its string and a variable that will contain the pouch:

$string = 'QD 10 LT 20 PA 30';
preg_match('/^\D+(\d+)\D+/', $string, $matches);
echo($matches[1]);

In case, I used the bookmark ^, which indicates the beginning of the string. Then I check up to the first group of digits ((\d+)). Since you just want to know about the first group, you don’t need to check the rest of the string.

The variable $matches will contain the snippets that regex captures. And how \d+ is in parentheses, will be formed a catch group. And since it is the first pair of parentheses, it will be available in $matches[1]. That’s why this code prints 10.


There is a detail in the regex. Just after the ^ there’s the \D+ (one or more characters other than numbers). This means that the string cannot start with numbers. That is, this code does not print anything, because as the string starts with numbers, the preg_match returns false:

$string = '11 QD 10 LT 20 PA 30'; // começa com números (não entra no if)
if (preg_match('/^\D+(\d+)\D+/', $string, $matches)) {
    echo ($matches[1]);
}

Therefore, an alternative is to exchange the quantifier + (one or more occurrences) per * (zero or more occurrences):

$string = '11 QD 10 LT 20 PA 30'; // começa com números
if (preg_match('/^\D*(\d+)\D+/', $string, $matches)) { // troquei o primeiro \D+ por \D*
    echo ($matches[1]); // agora entra no if
}

But there is another however. From what I understand, you also want to validate whether the string is in the format "two letters, two numbers, two letters, two numbers, etc".

Let’s assume that the format is exactly this (always two letters and two numbers, separated by space, and this pattern can be repeated several times). In this case, we can exchange the + for {2}, which means "exactly two occurrences":

$string = 'QD 10 LT 20 PA 30 SC 40';
if (preg_match('/^[A-Z]{2} (\d{2})( [A-Z]{2} \d{2})*$/', $string, $matches)) {
    echo ($matches[1]); // 10
}

First we have the beginning of the string (^), and then [A-Z]{2} (\d{2}):

  • [A-Z]{2} means "2 letters of A to Z" (uppercase only). If you want lower case letters only, switch to [a-z]{2}, and if it’s both uppercase and lowercase, use [A-Za-z]{2}
  • then we have a blank space (since the fields are separated by a space)
  • then we have (\d{2}): two digits within a capture group (so that it is in $matches[1])

Then we have ( [A-Z]{2} \d{2})*. Inside the parentheses we have a space (notice the space just after the (), followed by two letters, another space and two digits. Outside the parentheses we have *, which means that this whole sequence (space, 2 letters, space, 2 digits) can repeat itself zero or more times. This ensures that your string can be both QD 10 LT 20 PA 30 SC 40 how much QD 10.

Finally, we have $, which means "end of string". With this I guarantee that the whole string has the format I need, and I still get the first group of digits in $matches[1].

If the number of letters and digits can vary, simply change the quantifier. Examples:

  • \d{1,5} - between 1 and 5 digits
  • \d{2,} - 2 or more digits (no cap)

Adjust to the values you need.


Get all the numbers

In this case, use preg_match_all:

$string = 'QD 10 LT 20 PA 30 SC 40';

if (preg_match_all('/\b(\d{2})\b/', $string, $matches, PREG_SET_ORDER)) {
    foreach ($matches as $m) {
        echo $m[1] . "\n";
    }
}

In this case, I’m using \b as a delimiter, because it represents a "wordborder", which in practice checks whether there is any non-alpha-numeric character before and after the (\d{2}). The difference is now $matches will contain an array of all 2-digit occurrences of the string. The output is:

10
20
30
40

The difference is that the preg_match_all is only taking the digits, but does not actually validate the format of the string. That is, if the string is abcdef 10, xyz!20, still she’ll get the numbers.

But you can use the previous regex only to validate the format, and then use preg_match_all to extract the numbers:

$string = 'QD 10 LT 20 PA 30 SC 40';
// verifica o formato da string
if (preg_match('/^[A-Z]{2} (\d{2})( [A-Z]{2} \d{2})*$/', $string)) {
    // só extraio os números se o formato estiver OK
    if (preg_match_all('/\b(\d{2})\b/', $string, $matches, PREG_SET_ORDER)) {
        foreach ($matches as $m) {
            echo $m[1] . "\n";
        }
    }
}
  • 1

    This already helps me a lot. I wanted to do with replace because the position and regex is the client that defines. So instead of using match_all and typing the positions of each number from 0 to 2, for example, he would type from 1 to 3 just in order to be visually clearer. I got to see some recursive commands in regex but could not get to what I wanted.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.