Return "block" text with a keyword from a txt file

Asked

Viewed 71 times

-3

I have a text file inside it contains coupons follows example of a coupon:

                 COTIA CENTRO
                ATACADAO S.A.
               PROF JOSE BARRETO
-----------------------------------------------
CNPJ 00.000.000/0000-00
IE 000.000.000.000
IM ISENTO
-----------------------------------------------
G              Extrato No. 182863Gþ
G        CUPOM FISCAL ELETRâNICO - SATGþ
-----------------------------------------------
#|COD|DESC|QTD|UN|VL UN R$|(VL TR R$)*|VL ITEM 
-----------------------------------------------
001 00071162 AGUA COCO C.JORDAO    1X200ML 
     6 UND9 X 1,49 (1,72)                  8,94
desconto sobre item                       -1,20
002 00001650 COCO SECO TROPIC.       1X1Kg 
 0,828 KG9  X 3,99 (0,73)                  3,30

Total bruto de Itens                      12,24
Total de descontos/acrescimos sobre item  -1,20
GTOTAL R$                                  11,04Gþ

Vale Alimentacao                          11,04

-----------------------------------------------
OBSERVACOES DO CONTRIBUINTE

*Valor aproximado dos tributos do item
Valor aproximado dos tributos deste cupom
(conforme Lei Fed.12.741/2012) R$          2,45
Vlr.Aprox.Tributos: Federal R$0,47 (4,26%) 
Vlr.Aprox.Tributos: Estadual R$1,98 (17,93%) 
Fonte: IBPT.
-----------------------------------------------
       CIELO-VEROCARD BENEFICIO
           000000******0000
PDV=75151429 DOC=140068 AUT=938956
VALOR:11,04 S.DISP:1.166,06    (SiTef)
-----------------------------------------------
G              SAT No. 00000000Gþ
            14/06/2021 - 08:27:29

G        0000 0000 0000 0000 0000 0000 Gþ
G          0000 0000 0000 0000 0000 Gþ
CFe35210675315333005925590002843141828638739062|20210614082729|11.04||jf6L4XuLg/T9PyMFRUoWGyqCQZG+YgzerKsDm7GLllv/w6BFvDKIBsRemosUSKyOsDfMkS2Bds+yXqrucQa1zmu2HpVlWxF8qu+M3MB7uMRub5H1NibCZAmQBY7MbXiXQm/0lC4jzG2rnDrmlI19OtJQDgODNDySgTViB3xiQmQVbF/jjM5aLnwZ9wNWReMI4uQHB/Dd3N8w8OVTxEPx7N3p27KGskS/5EmbNc1EX+nhHVNYkOQCzEi5ip0pALN3EzvD/p4b11ThNt697UhM7mRaavjapoEDBBTIrUx1YxOQyWPfeflarB72rePPzpbM9daRvvtkNu7LAxeO/46oOg==
-----------------------------------------------
 TPLinux AT.14.c00N-19.07 - Unisys Brasil Ltda
-----------------------------------------------
4610-CR2 VERSAO:16.05       PDV:002       LJ:059
OPR:000243418Tatiana Lig    14/06/2021 08:27:33

All coupons start with COTIA CENTRO and end in OPR:...

I want to do a search and find a specific coupon based on the number of the Extrato or TOTAL and if you find it return me this coupon.

I tried using Regex:

$caminho = 'arqEspelho.txt';

$espelho = file_get_contents ($caminho);

$re = '/(?=\s{17}COTIA CENTRO).+?(.+?OPR:.\d+\w+\s\w+\s+\d{2}\/\d{2}\/\d{4}\s\d{2}:\d{2}:\d{2})/s';

preg_match_all($re, $espelho, $matches, PREG_SET_ORDER, 0);

echo '<pre>';
var_dump($matches);

However it returns in Multi Array and do not know how to do this search How could I perform this search? Would you have some easier way to separate the coupons to perform the search?

1 answer

0

Regex is probably not the best option. Maybe you should read the file line by line, and for each line you see if it is at the beginning or end of a coupon, or if there is an extract on that line, etc:

$cupons = [];
$texto_cupom = '';
$handle = fopen("arqEspelho.txt", "r");
if ($handle) {
    // lê as linhas do arquivo
    while (($linha = fgets($handle)) !== false) {
        echo $linha;
        $texto_cupom .= $linha;
        if (strpos($linha, 'COTIA CENTRO') !== false) {
            // começa um novo cupom
            $texto_cupom = '';
            $cupom = [];
        } else if (preg_match('/Extrato No. (\d+)/', $linha, $m)) {
            $cupom['extrato'] = $m[1];
        } else if (preg_match('/TOTAL R\$\s+(\d+,\d{2})/', $linha, $m)) {
            $cupom['total'] = $m[1];
        } else if (strpos($linha, 'OPR:') === 0) {
            // fim do cupom, salva o texto atual e adiciona no array de cupons
            $cupom['texto'] = $texto_cupom;
            $cupons[] = $cupom;
        }
    }

    fclose($handle);
} else {
    // erro ao abrir o arquivo
}

Thus, $cupons will be an array containing other arrays (and each of these internal arrays has coupon data). To search for this array, it is easy:

foreach($cupons as $cupom) {
    if ($cupom['extrato'] == '123') {
        // achou extrato 123, imprime o total e o texto do cupom
        echo $cupom['total'], $cupom['texto'];
    }
}

I only used regex to pick up the extract and the total, making it much simpler. The information I want is in parentheses, which way capture groups, I can pick up after (using $m[1] - the first group, since it is the first pair of parentheses of the regex).


With a single regex, it’s much more complicated:

$caminho = 'arqEspelho.txt';
$espelho = file_get_contents($caminho);

$re = '/^ +COTIA CENTRO(?:.*\n)+?.+Extrato No. (\d+)(?:.*\n)+?.+TOTAL R\$ +(\d+,\d{2})(?:.*\n)+?^OPR:.+$/m';
if (preg_match_all($re, $espelho, $matches, PREG_SET_ORDER)) {
    foreach($matches as $m) {
        list($cupom, $extrato, $total) = $m;
        // $cupom tem todo o texto do cupom, $extrato e $total tem os respectivos valores
    }
}

I use it several times (?:.*\n)+? (zero or more characters, followed by a line break, all this repeated several times), to skip the lines that do not interest me. I also use the quantifier Lazy +?, to prevent it from going to the end of the file and end up thinking that all the coupons are one thing (better understand the quantifier Lazy reading here and here).

And use the modifier m, so that the markers ^ and $ indicate the beginning and end of a line (without the m, they take only the beginning and end of the string).

Then regex takes everything from "COTIA CENTRO" to the extract and the total (also using capture groups to get this data), and ends with "OPR".

Then simply iterate through the array of $matches and get the information, then you do whatever you need with it.

But I still prefer to read the file line by line, as in the first solution above. Not only is regex easier to understand and maintain, but the final array is - in my opinion - more organized. Of course you can also mount the same array using the above regex, but the fact of having a very complicated expression makes it not worth it.

  • Using this file link as an example, when I get the coupon for both the extract (182862) and the value (66.81), it is returning the same coupon twice and the second time the coupon is coming extra items until the second way. Example - 1st Coupon link Example - 2nd Coupon link

  • @Leandropio The problem is that in that file The coupons have the OPR line, then have a line with "zucchini", then have several other lines and another OPR. That is, he finishes the first coupon, but before starting another one (i.e., before finding "COTIA CENTRO"), he finds another ending (another "OPR"). If a coupon can have more than one OPR, then you have to change the code. But if each coupon can only have one OPR, then the file is wrong

  • I get it, is that that would be the second way of the voucher, would I have to finish the coupon as soon as I found the first correct OPR? How could I do that? Thank you very much for the great help

  • @Leandropio would probably have to have one more control variable (type $encontrei_opr) which indicates whether you have already found the OPR and have not yet reached another coupon. And then you ignore the second way, for example...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.