Extract the price of a text and show it formatted

Asked

Viewed 75 times

2

I need to settle this matter:

Considering an excerpt from a search page flying:

-Best price without scales R$ 1.367
-Best price with scales R$ 994

1 - Including all fees.

Write a regular expression to find the best price with or without scales, then use your expression to extract the string corresponding to the chosen value and then convert the result to decimal value (float) so that we have only "1367.00" or "994.00" .

I came to make the following expression:

$preco = "R$ 1.367";
$validaPreco = preg_match('/^[0-9]$/', $preco);

Only I don’t know how to remove the R$ and manage to resolve the issue.

How would you do that?

2 answers

2

If the text at all times has this format, so just take the snippets "R$ etc":

function formatar($preco) {
    return number_format(str_replace('.', '', $preco), 2, '.', '');
}

$texto = 'Melhor preço sem escalas R$ 1.367\n-Melhor preço com escalas R$ 994';
if (preg_match('/sem escalas R\$ (\d+(?:\.\d{3})*).*com escalas R\$ (\d+(?:\.\d{3})*)/s', $texto, $matches)) {
    echo "Sem escalas: ". formatar($matches[1]);
    echo "\nCom escalas: ". formatar($matches[2]);
}

\d+ takes one or more digits, and then there’s a snippet \.\d{3} (a dot followed by 3 digits), only that this whole section is grouped between parentheses and with the quantifier * (zero or more occurrences). That is, I can have "dot followed by 3 digits" being repeated zero or more times (maybe it’s exaggeration because the price of a ticket will not be more than 1 million reais, so it could also be (\d+(?:\.\d{3})?) - the ? indicates that the passage is optional).

All the part that interests me (the numerical value) is in parentheses, because thus form a catch group that I can recover after. The first price (no scales) will be in the first group ($matches[1]) and the second price will be on $matches[2]. Already the section "point followed by 3 digits" is with (?: - this forms a no-capture group, so I don’t create random groups in the array $matches - I’m only interested in full prices.

I also use .* (zero or more characters) and the flag s makes the point also correspond to line breaks (since the texts seem to be in different lines).

Given the prices, I can format them any way I see fit. When formatting, I deleted the point because when converting the string to number the point is used as decimal separator (then 1.367 would be interpreted as 1,367 and not as "one thousand three hundred and sixty-seven"). Then I format this number to have only two decimal places, using the dot as decimal separator and no separator among the thousands (see documentation of number_format for more details).

The output of the code is:

Sem escalas: 1367.00
Com escalas: 994.00

His regex ^[0-9]$ doesn’t work because she uses the markers ^ and $ (respectively the beginning and end of the string) and only see if it has a single digit (i.e., the string could only have a character, which is a digit from 0 to 9).

  • Tested and approved! https://ideone.com/iHvAKx

-1

(?<=R\$ )\d+.?\d+

Thus the R$ stays within a checking group that will not be part of the match

Browser other questions tagged

You are not signed in. Login or sign up in order to post.