How to create specific validations using Regex

Asked

Viewed 100 times

0

I have the following data

30140083|MG 13407061|07309482697|12/01/86|EMILLY L V GOMES|4984012024889591|10/2018|557 3283005|165055881|07309480805|11/18/16|AILTON B TEGON|5549320054855241|6/2018|177 3691100|17714445|07309395883|01/07/67|MARGARETE P SAN|319150122924465|12/2023|8671

I would like to create validations and/or conditions using the Expressões Regulares to verify that cards initiated with (4, 5, 6) have as their security code the amount of 3 (three) characters. Already the cards started with (3) have as its security code the amount of 4 (four) characters. The respective values such as:

  • 4 = Visa (CVV: 3)
  • 5 = Mastercard (CVV: 3)
  • 6 = Elo/Discover/Diners Club (CVV: 3)
  • 3 = American Express (CVV: 4)

To get the values, without defining the conditions I use this Pattern: (\d{15,16})[^<](\d{1,2})[^<](\d{2,4})[^<](\d{3}) but I can only get values like these: "4984012024889591|10/2018|557", "5549320054855241|6/2018|177" already this: "319150122924465|12/2023|8671" ends outside the pattern. I use the function preg_match_all PHP. I need to get all the respective values to validate them with the Algorítmo Luhn and return to the front-end.

It is worth mentioning some points..

FLAG | BIN (initials)


  • American Express | 34, 37
  • Aura | 50
  • Diners Club | 300, 301, 302, 303, 304, 305, 36, 38, 39
  • Discover | 6011, 622, 64, 65
  • Elo | 401178, 401179, 431274, 438935, 451416, 4573, 4576, 506, 509, 636, 6500, 6504, 6505, 6507, 6509, 6516, 6550, 504175, 627780
  • Hypercard | 606282
  • JCB | 3088, 3096, 3112, 3158, 3337, 35
  • Mastercard | 5, 2
  • Visa | 4

Not all have as their identifier only the 1° digit, as well as the American Express, Diners Club and the JCB its first digits is 3, there is the question that I could not solve using only RegExp.

EDIT

I’ve tried to add {3, 4} to get so much CVV with 3 digits, as 4 digits, but it may occur that next to the CVV contain a date, example: "5549320054855241|6/2018|17701/01/2021", consequently the RegExp will get the highest value, unlike "5549320054855241|6/2018|177|01/01/2021" would return me quiet only "5549320054855241|6/2018|177", so the need to apply a condition, from the Flag of the Card, because if initially it start with 4 or 5 I know their respective CVV has only 3 digits. Already in cases like American Express which has 4 digits of CVV and 15 digits of Card, could distinguish and get the values correctly.

Thanks in advance!

  • If the idea is just to take the numbers, it wouldn’t just be to add {3,4} at the end of the regex? /(\d{15,16})[^<](\d{1,2})[^<](\d{2,4})[^<](\d{3,4})/.

  • @Weslleyaraújo no, no, by the way I tried this way, but the way I receive the data can differentiate, for example, as well as "MG 13407061|0730948269" 13407061 is next to this delimiter | (which by the way may be different), next to the CVV may have other numbers, such as date and etc.. Type like this: "4984012024889591|10/2018|55701/01/2021" ai if in Pattern, it contains {3.4} that is it will try to catch the largest number, but by default, the Visa flag, started with number 4 has only 3 digits of its security code (cvv)

  • consequently he would get "4984012024889591|10/2018|5570" which by the way is invalid.

  • 2

    This example dated after the CVV is not in the question. I suggest [Edit] and put all possible variations, otherwise any answer will be incomplete... In fact, you just want to validate or want to take each information separately (the card number, the CVV, etc)?

  • I edited, and as I go on trying I will add.

  • @hkotsubo necessarily need to obtain them separately, but that are part of a single group, in for or foreach need to pick up each group (card, month, year, cvv) and add them to a Array Object, because I do all four validations. 1° if the card is true (if it conforms to the valid algorithm), 2° if the expiration date, that is, the expiration date of the card is not expired (less than the current month or year) and if the cvv corresponds to the flag (be it Amex, Visa, Master, Hyper, Elo, etc...)

  • I hope that this data is fictitious

  • @Eduardobissi for illustrations only, are true but invalid cards.. does not belong to a "physical" or "legal" person".

Show 3 more comments

1 answer

2


Maybe regex is not the best solution. Instead, you could simply separate the data and treat it separately. Assuming that each record is in a row, a "skeleton" of the general idea would be:

$dados = <<<DADOS
30140083|MG 13407061|07309482697|12/01/86|EMILLY L V GOMES|4984012024889591|10/2018|557
3283005|165055881|07309480805|11/18/16|AILTON B TEGON|5549320054855241|6/2018|177
3691100|17714445|07309395883|01/07/67|MARGARETE P SAN|319150122924465|12/2023|8671
DADOS;
// para cada linha dos dados
foreach (explode("\r\n", $dados) as $line) {
    // separar as informações (supondo que o separador sempre é "|")
    $partes = explode('|', $line);
    // supondo que os dados sempre estão nas mesmas posições
    $numero = $partes[5];
    $cvv = $partes[7];

    // --------- validações ---------
    // exemplo: Diners Club (coloque todos os valores que o cartão pode iniciar)
    $tmp = substr($numero, 0, 3);
    if ($tmp == '300' || $tmp == '301' || etc... ) {
        // cartão é Diners
        $tamanho_cvv = 4;
    } else if ($numero[0] == '4') {
        // cartão é Visa
        $tamanho_cvv = 3;
    } // etc (coloque as outras condições específicas de cada bandeira, ajustando o tamanho do CVV para cada caso)

    // verificar CVV
    if (strlen($cvv) == $tamanho_cvv) {
        // tamanho CVV ok
    } else {
        // CVV com tamanho errado
    }
}

Of course you can improve, this is just an initial idea, but once you have the card number and the CVV, you can validate it in any way you like. It could be for example:

function getTamanhoCvv($numero_cartao) {
    $digito = intval($numero_cartao[0]);
    if (4 <= $digito && $digito <= 6)
        return 3;

    if ($digito == 3)
        return 4;

    // adicione mais casos se necessário

    // se chegou aqui, dá erro (número inválido?), ou retorna algum valor default, veja o que faz mais sentido no seu caso
}

function diners($numero_cartao) {
    $tmp = intval(substr($numero_cartao, 0, 3));
    if (300 <= $tmp && $tmp <= 305)
        return TRUE;
    $tmp = intval(substr($numero_cartao, 0, 2));
    return $tmp == 36 || $tmp == 38 || $tmp == 39;
}

function getBandeira($numero_cartao) {
    if (diners($numero_cartao)) {
        return 'Diners Club';
    }

    // teste as outras bandeiras de maneira similar...
}

...
$numero = // obter número do cartão conforme já explicado
$cvv = // obter CVV conforme já explicado
$tamanho_cvv = getTamanhoCvv($numero);
$bandeira = getBandeira($numero);

// verificar CVV
if (strlen($cvv) == $tamanho_cvv) {
    // tamanho CVV ok
} else {
    // CVV com tamanho errado
}

etc...

And to check the cases that are dated after the CVV, just do something like:

// verifica se tem data após o CVV
if (preg_match('#(\d{3,4})\d{2}/\d{2}/\d{4}#', $cvv, $match)) {
    $cvv = $match[1];
}

Since the idea is not to validate the date (just see if you have something that looks like "dd/mm/yyyy"), just check this and if so, remove from the CVV value.


Do you really want to use regex?

I don’t recommend it, because as you can see, there are a lot of different rules and, although it’s even possible, it would be giant, difficult to understand and maintain. Already the above code, in my opinion is simpler, not only to do, but also to understand and modify (to add new rules/flags/etc, just go incrementing the algorithms, creating new validation functions and so on). It gets more organized and in the end it ends up being more worthwhile.

Just to give you an idea of what it would look like, follow a "simplified" version with only 4 flags:

$regex = '#
  (?: (?P<diners>   30[0-5]\d{12,13}   |   3[689]\d{13,14}   )  \| [^|]+ \|  (?P<cvv_diners>\d{4})  )  |
  (?: (?P<amex>  3[47]\d{14}  )  \| [^|]+ \|  (?P<cvv_amex>\d{4})  )  |
  (?: (?P<mastercard>  [25]\d{15}  )  \| [^|]+ \|  (?P<cvv_mastercard>\d{3})  )  |
  (?: (?P<visa>  4\d{15}  )  \| [^|]+ \|  (?P<cvv_visa>\d{3}) )
#x';

$dados = <<<DADOS
30140083|MG 13407061|07309482697|12/01/86|EMILLY L V GOMES|4984012024889591|10/2018|557
3283005|165055881|07309480805|11/18/16|AILTON B TEGON|5549320054855241|6/2018|177
3691100|17714445|07309395883|01/07/67|MARGARETE P SAN|319150122924465|12/2023|8671
DADOS;

$bandeiras = [ 'diners', 'amex', 'mastercard', 'visa'];
if (preg_match_all($regex, $dados, $matches, PREG_SET_ORDER)) {
    foreach ($matches as $m) {
        foreach ($bandeiras as $b) { // verifica se os named groups foram encontrados
            if ($m[$b]) { // se encontrou, preenche as informações
                $numero_cartao =$m[$b];
                $cvv = $m["cvv_$b"];
                $bandeira = $b;
                break; // se já encontrou uma bandeira, não precisa procurar pelas outras
            }
        }
        echo "cartão $bandeira: $numero_cartao, $cvv\n";
    }
}

I used the modifier x, which allows you to write the regex in several lines and with spaces, so that it is a little less complicated to read (imagine if it was all in one line and no spaces).

Then I use alternation (the character |, which means "or"), to get the various possibilities.

For example, for Diners, I see if it starts with 30[0-5] (the digits 3 and zero, followed by a digit from zero to 5), followed by 12 or 13 digits (\d{12,13}), or starts with 3[689], that takes 36, 38 or 39, followed by 13 or 14 digits.

Then I take \|[^|]+\|, which is the character | (escaped with \ not to be confused with the toggle), followed by one or more characters other than |, followed by another |, and finally I get the CVV (which in this case has 4 digits).

After all this has an alternation (the | at the end of the line) and then there’s another regex for American Express (3[47] magpie 34 or 37, followed by 14 digits - adjust the values for whatever you need, I don’t know if you can have less, for example), and so on (each flag has its rules, including the CVV size).

For every information I use named groups. For example, (?P<visa> 4\d{15} ) indicates that if regex finds a match for 4\d{15} (the number 4 followed by 15 digits), the result will be in the group called "visa". The same goes for the CVV, each one has a name, so just check if the group is filled in to know if that information was found. And since I’m using alternation, then only one of the flags will be found at a time, so I just need to search the groups until I find one of them.

But note that I only put four flags. For each of them you would have to add another specific regex, and is it really worth it? For me, it’s already complicated enough, and adding more lines there will only make it worse. I still prefer to go with the first option (treat each line separately, break the data, create separate functions for each flag, etc). Regex is legal, but it is not always the best solution.

For example, to handle cases where you may have a date after the CVV, simply add (?:\d{2}/\d{2}/\d{4})? after them (the ? indicates that this whole section is optional). It would look worse than it already is:

$regex = '#
  (?: (?P<diners>   30[0-5]\d{12,13}   |   3[689]\d{13,14}   )  \| [^|]+ \|  (?P<cvv_diners>\d{4})  (?:\d{2}/\d{2}/\d{4})?  )  |
  (?: (?P<amex>  3[47]\d{14}  )  \| [^|]+ \|  (?P<cvv_amex>\d{4})   (?:\d{2}/\d{2}/\d{4})?  )  |
  (?: (?P<mastercard>  [25]\d{15}  )  \| [^|]+ \|  (?P<cvv_mastercard>\d{3})  (?:\d{2}/\d{2}/\d{4})?  )  |
  (?: (?P<visa>  4\d{15}  )  \| [^|]+ \|  (?P<cvv_visa>\d{3})  (?:\d{2}/\d{2}/\d{4})?  )
#x';
  • I will try and return to you commenting here if it worked out, likely to make small changes.. tamo together warrior!!

  • @gleisin-dev I updated the answer with the question of the date after the CVV. Anyway, I insist on what I said throughout the answer: perhaps the first solution (explode and then take the parts separately and create specific functions to validate each part) is better than trying a single regex that does everything

  • Show my friend!! Thanks a lot..

Browser other questions tagged

You are not signed in. Login or sign up in order to post.