Regex to extract string content

Asked

Viewed 131 times

3

Good evening, everyone,

Thank you for visiting and proposing to help me. I’m terrible at regex so I’ve come to ask for your help.

I have the following string that can also vary as example:

string(49) "02/12/2018 (Assessment 2) = /86= | Weight: 50.00%"
string(49) "02/12/2018 (Assessment 2) = 50.83/86= | Weight: 50.00%"

In the first case even without the amount I need to collect 00.00.

I need to extract in an array as follows:

$dados[ "date" ] = "02/12/2018"
$dados[ "markOK" ] = "50"
$dados[ "markTotal" ] = "86"
$dados[ "weight" ] = "50.00"

Other examples of output:

string(49) "02/12/2018 (Assessment 2) = /86= | Weight: 50.00%"
string(59) "06/11/2018 (Assessment 2) = 22.40/35=32.00 | Weight: 50.00%"
string(49) "04/12/2018 (Assessment 2) = /60= | Weight: 50.00%"
string(59) "11/09/2018 (Assessment 2) = 27.00/40=33.75 | Weight: 50.00%"
string(59) "09/09/2018 (Assessment 2) = 30.00/30=50.00 | Weight: 50.00%"
string(59) "14/08/2018 (Assessment 2) = 31.00/40=38.75 | Weight: 50.00%"
string(59) "19/06/2018 (Assessment 2) = 63.00/72=43.75 | Weight: 50.00%"
string(59) "17/06/2018 (Assessment 2) = 45.00/45=50.00 | Weight: 50.00%"
string(59) "22/05/2018 (Assessment 2) = 11.00/55=10.00 | Weight: 50.00%"
  • 1

    Edit the question with the code of what you’ve already tried to analyze.

  • @Sam I could not develop any reasoning in regex for the solution, only using substring, but as the data may vary from position the code broke =/

  • 1

    @Gustavofilgueiras Even so, it is important to put the code you have tried to do and what errors are occurring. And if the string can change, it would be interesting to also put all the possibilities (or if there are many, some examples followed by an explanation of how it can vary). Please click on [Edit] and add these details. And I suggest you read the [tour] and the pages [Ask] and [mcve] to better understand how questions should be.

  • @Sam thanks plea tip, I edited with the other example that may vary, but on the issue of code, I really did not get any advance, sorry.

  • @Sam Eu think which is in the first case, in /80 (since there is no "50" before the bar, then the value must be zero). At least that’s what I understood...

  • @hkotsubo Exactly that my comrade !

  • @Sam, it can be 00 or 00.00 is pq sometimes the value is also broken in real, like: 50.43

  • @Sam edited it too, sorry

  • Sorry I can’t be of more help, but I’ll have to hang up now. Anyway, if the only variation is that the "50" before the bar is optional, I think using a combination of strpos, strrpos and substr maybe it’s easier than regex. But if you really want to use regex, you can take a look at some tutorials, like that and that

  • @hkotsubo Thank you man

  • In the case of 86= you just want to 86 (without the =)... and in the case of 40=33.75?

Show 6 more comments

2 answers

1


Wouldn’t use regex for that. You can break the string in array by space and make a forEach associating the values:

<?
$string = "02/12/2018 (Assessment 2) = /86= | Weight: 50.00%";
$array = explode(" ", $string);

forEach($array as $item){
   // verifica quantas barras "/" a string possui
   preg_match_all("~\/~", $item, $matches);

   // se tiver 2 barras é uma data
   if(sizeof($matches[0]) == 2){
      $dados[ "date" ] = $item;
   }

   if(sizeof($matches[0]) == 1){
      $mark = explode("/", $item);
      // se o primerio estiver vazio, retorna 00.00
      $dados[ "markOK" ] = $mark[0] ? $mark[0] : "00.00";
      $dados[ "markTotal" ] = $mark[1];
   }

   if(strrpos($item, "%")){
      $dados[ "weight" ] = str_replace("%", "", $item);
   }
}

var_dump($dados);
?>

The result is:

array(4) {
  ["date"] => "02/12/2018"
  ["markOK"] => "00.00"
  ["markTotal"] => "86="
  ["weight"]=> "50.00"
}

IDEONE

  • is perfect :) Only in the array position["makrTotal"], it is coming with the equal sign

  • Just make a replace: str_replace("=", "", $mark[1])

  • thanks man worked 100%

0

In your case you will need to use groups in regex to delimit and also identify what you want to search/extract, groups are identified by using elements in parentheses.

This program uses the samples you put in the question as the basis and as the regular expression became (quite) long I separated into smaller pieces so that it is easier to read and understand (as well as to adjust):

<?php

    $sampleData = array(
        "02/12/2018 (Assessment 2) = /86= | Weight: 50.00%",
        "06/11/2018 (Assessment 2) = 22.40/35=32.00 | Weight: 50.00%",
        "04/12/2018 (Assessment 2) = /60= | Weight: 50.00%",
        "11/09/2018 (Assessment 2) = 27.00/40=33.75 | Weight: 50.00%",
        "09/09/2018 (Assessment 2) = 30.00/30=50.00 | Weight: 50.00%",
        "14/08/2018 (Assessment 2) = 31.00/40=38.75 | Weight: 50.00%",
        "19/06/2018 (Assessment 2) = 63.00/72=43.75 | Weight: 50.00%",
        "17/06/2018 (Assessment 2) = 45.00/45=50.00 | Weight: 50.00%",
        "22/05/2018 (Assessment 2) = 11.00/55=10.00 | Weight: 50.00%"
    );

    $regex = '/$'.                          // começo da linha
        '([0-9]{2}\/[0-9]{2}\/[0-9]{4})'.   // 99/99/9999 (obrigatório) [1]
        ' \(Assessment.+\)'.                // " (Assessment *) " -- ignorado
        ' = '.                              // " = " -- ignorado
        '([0-9.]{5})?'.                     // 99.99 (opcional) [2]
        '\/'.                               // "/" -- ignorado
        '([0-9]{2})?'.                      // 99 (opcional) [3]
        '='.                                // "=" -- ignorado                  
        '([0-9\.]{5})?'.                    // 99.99 (opcional) [4]
        '.+Weight: '.                       // "*Weight: " -- ignorado
        '([0-9\.]{5})'.                     // 99.99 (obrigatório) [5]
        '%$/';                              // "%" e termina a linha;

    foreach ($sampleData as $i){
        preg_match($regex, $i, $matchList);
        print_r($matchList);
    }

?>

At the end, the loop foreach() at the end catches the array item by item, applies to regex and puts the result of the operation in the array $matchList.

Here [and a sample of the result:

Array
(
    [0] => 04/12/2018 (Assessment 2) = /60= | Weight: 50.00%
    [1] => 04/12/2018
    [2] => 
    [3] => 60
    [4] => 
    [5] => 50.00
)

The first item is always what was used in the query while the others are what was found and if not it will be empty.

  • The problem is that [0-9.]{5} also accepted ..... and ..0.. as valid, see. If you want to avoid these cases, you better be a little more precise and use something like [0-9]+(?:\.[0-9]+)?, who accepts 50 and 50.12, see here. And {5} limits to exactly 5 characters, but I don’t know if that’s the case (it wasn’t explicitly mentioned if it has values less than 10 or more than 99, but I would only limit it to exactly 5 if I was sure). Anyway, you can’t know just based on the examples, but it’s important to pay attention to these details :-)

  • Precisely, I worked with the data sample and did something that fit, but the regex which has already gotten longer would get bigger even.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.