Pick string by format

Asked

Viewed 67 times

1

I have an array of 1000 indexes with texts and random numbers stacked together, I need to take this text snippets that are in particular formats that contain pre-defined amounts of characters, such as a string as follows: 65 45 98 12 15 98 (17 characters) or 04668475 03/1980 (date format after numerical sequence), how can I create a filter function to pass the indexes through it and only return me on the screen the strings that fit these formats ?

Informal example:

$strings = array("1" => "12 32 87 98 54", "2" => "154654651", "3" => "1354654654  45 45 45 45 45");
$mascara = ## ## ## ## ##;
$contar = count($strings);

for($i=0; $i < $contar; $i++){

    if($mascara == $strings[$i]){
        echo $strings[$i]."encontrado";
    }else{
        echo $strings[$i]."nao encontrado";
    }

}

In this case the indexes would be compared to the mask and if there is a string with the format of the mask would be printed as found.

  • 1

    Regular expression, but to better answer we will need you to detail better what is the filter rule, that is, which will be exactly the desired format, which values should return and which.

  • The problem is that the values are randomized but ordered in formats, I need to take as the formats and not the values :/

  • Hence the regular expression. Edit the question and explain this sentence: "I need to pick strings that are in particular formats containing a certain amount of numbers in certain formats"

  • for example, I have the following string $text = ":63544 42168798975 12/1990"; I need a filter function that ignores random values :63544 and takes the constant snippets that have 11 characters and their date followed.

  • So that’s exactly what I need you to explain in the question. Do all lines follow this format, from a random number, an 11-digit sequence and a date? If not, what are all the formats you want to consider?

  • has two formats "0000 0000 0000 0000 0000 00/0000" and this "0000000000000000 00/0000" the texts I need to pick up are constantly in this format.

Show 1 more comment

1 answer

0

Like commented, you will have only two formats allowed: "0000 0000 0000 0000 00/0000" and "0000000000000000 00/0000 000", that is, a 16-digit sequence, which can be grouped every four, followed by a date and a three-digit sequence.

So we can define a regular expression:

((\d{4}\s?){3}\d{4}) (\d{2}\/\d{4}) (\d{3})

Where:

  • ((\d{4}\s?){3}\d{4}) will capture the 16-digit sequence, and spaces may or may not exist every four digits. Read: four digits, \d{4}, followed by an optional white space, \s?, repeated three times, {3}, followed by four digits, \d{4}, between parentheses, to capture the value;
  • A blank space;
  • (\d{2}\/\d{4}) will capture the date: two digits followed by four digits;
  • Other space;
  • (\d{3}) will capture any three digit number;

If you want to validate a text allowing checking if it rigidly follows this format, you can start the expression with ^ and end with $, that define the beginning and end of the text, so if the text has, or not, anything that escapes from the format the filter will block.

^((\d{4}\s?){3}\d{4}) (\d{2}\/\d{4}) (\d{3})$

See a simple example:

$tests = [
  "0000 0000 0000 0000 00/0000 000",   // válido
  "0000000000000000 00/0000 000",      // válido
  "00000000 00000000 00/0000 000",     // válido
  "0000000 000000000 00/0000 000",     // inválido: espaços errados
  "0000 000 0000 0000 00/0000 000",    // inválido: tamanho da sequência errada
  "000000000000000 00/0000 000",       // inválido: tamanho da sequência errada
];

foreach($tests as $test)
{
  if (preg_match("/^((\d{4}\s?){3}\d{4}) (\d{2}\/\d{4}) (\d{3})$/", $test, $matches))
  {
    print_r($matches);
  }
}

See working on Ideone | Repl.it

Browser other questions tagged

You are not signed in. Login or sign up in order to post.