How to validate the structure of a text file in PHP?

Asked

Viewed 410 times

3

I created a space admin in which users deposit type files .txt in my FTP.

I would like to impose a format. Example the whole file should contain two three columns separated by a dot and comma.

Example :

filename.valido.txt

nom;age;height;
nom;age;height;
nom;age;height;

Any file that does not respect this format should be ignored.

Example:

ficheiro_invalido.txt

house;age;city;height;
village;age;father;children
nom;age;height;

  • You want to check the number of columns, is that it? For example, limit to 3, if all rows have 3, then it is valid, otherwise it is invalid

  • @Guilhermenascimento especially this ... at the time of uploading I need to check if the file respects that condition ( three columns ), neither more nor less !

  • 1

    But this upload is done via direct FTP or with PHP?

2 answers

8

I tried in several ways to create something efficient, however no way was able to validate everything, which ended up making me have to opt for while even with fgets (or fgetcsv).

This format you want is basically CSV, however this file format is not something so advanced, it is impossible to limit the number of columns in a "practical" way, an example of checking would be this:

<?php
function validaCSV($arquivo, $limite = 3, $delimitador = ';', $tamanho = 0)
{
    $handle = fopen($arquivo, 'rb');
    $valido = true;

    if ($handle) {
        while (feof($handle) === false) {
            $data = fgetcsv($handle, $tamanho, $delimitador);

            if ($data && count($data) !== $limite) {
                $valido = false; //Seta false
                break;
            }
        }

        fclose($handle);
    } else {
        $valido = true;
    }

    return $valido;
}

Example of use (the expected column pattern is 3):

var_dump(validaCSV('arquivo.txt')); //Checa se todas linhas tem 3 colunas
var_dump(validaCSV('arquivo.txt', 5)); //Checa se todas linhas tem 5 colunas

Will return true if valid, otherwise return false


If you want to read the file should it be valid, use it like this:

To avoid memory spikes chance the file is invalid, I created two whiles, it is a little slower but will not consume both the server (in case of invalid files)

Note: in the example I used yield so you can use inside a while your own

function lerCSV($arquivo, $limite = 3, $delimitador = ';', $tamanho = 0)
{
    $handle = fopen($arquivo, 'rb');

    if ($handle) {
        while (feof($handle) === false) {
            $data = fgetcsv($handle, $tamanho, $delimitador);

            if ($data && count($data) !== $limite) {
                throw new Exception('O numero de colunas excedeu o limite de ' . $limite);
            }
        }

        //Volta o ponteiro para o inicio do arquivo para poder usar novamente o while
        rewind($handle);

        while (feof($handle) === false) {
            $data = fgetcsv($handle, $tamanho, $delimitador);

            if ($data) { //Impede linhas vazias de retornarem false como valor
                yield $data;
            }
        }

        fclose($handle);
    } else {
        throw new Exception('Arquivo inválido: ' . $arquivo);
    }
}

Example of use:

foreach(lerCSV('a.csv') as $linha) {
    var_dump($linha);
}

Will issue Exception if the file is invalid/non-existent or if the line number is not the one determined in the function (default is 3)


Extra (with SplFileObject)

I was wondering about the file situation being open in case the yield, because if there is a break; in the foreach may occur of file not being closed, however the SplFileObject closes the file when class is "destroyed" (will occur __destruct (internal) class), then at this time the file will be "released", as explained in this question:

The version with SPL was like this:

<?php

function SplLerCSV($arquivo, $limite = 3, $delimiter = ';', $enclosure = '"', $escape = '\\')
{
    $file = new SplFileObject($arquivo);
    $minCol = $limite - 1;

    while ($file->eof() === false) {
        $data = $file->fgetcsv($delimiter, $enclosure, $escape);

        if (isset($data[$minCol]) && count($data) !== $limite) {
            throw new Exception('O numero de colunas excedeu o limite de ' . $limite);
        }
    }

    //Volta o ponteiro para o inicio do arquivo para poder usar novamente o while
    $file->rewind();

    while ($file->eof() === false) {
        $data = $file->fgetcsv($delimiter, $enclosure, $escape);

        if (isset($data[$minCol])) { //Impede linhas vazias de retornarem [ 0 => NULL ] como valor
            yield $data;
        }
    }
}

//Usando
foreach (SplLerCSV('a.csv') as $value) {
    var_dump($value);
}

3

Validate file structure (.txt)

PHP

<?php
if (isset($_POST['botao'])) {
    $invalido="false";
    //Receber os dados do formulario
    $arquivo_tmp = $_FILES['arquivo']['tmp_name'];

    //ler todo o arquivo para um array
    $dados = file($arquivo_tmp);

    //percorrer o array para verificar a estrutura de cada linha
    foreach($dados as $linha){
        //deve conter 3 nomes de colunas separadas por ; (ponto e virgula)
        if (count(array_filter(explode(';', $linha))) !== 3){
            echo "Nananinanão, estrutura em desacordo";
            //inviabiliza o upload
            $invalido="true";
            //finaliza a execução do foreach na primeira ocorrência inválida.
            break;
        }
    }

    if($invalido=="false"){
        echo "estrutura ok";
        //upload aqui
    }

}
?>

Form used in online testing.

<form method="POST" action="" enctype="multipart/form-data">
    <label>Arquivo</label>
    <!--Campo para fazer o upload do arquivo com PHP-->
    <input type="file" name="arquivo"><br><br>          
    <button type="submit" name="botao">Upload</button>
</form>

online test here

  • 2

    I would say that it is possible to slightly optimize the code, because if substr_count($z, ';') return a value other than 2, the file is already invalid, so you could stop the analysis at first occurrence. If the file has 1000 lines and the first one is already wrong, why analyze the other 999 lines? And it is valid to note that this solution is not recommended if there is the possibility of very large files, as it not only stores the contents of the file in memory but stores twice: once in the array $dados, another in string $result.

  • 2

    By the way, analyzing further, the call from substr_count could happen in the first foreach and second, which uses $result, nor should exist, as it results in a syntax error, since $result is a string. In PHP, string is not an eternal type. I think you edited the code of the other answer, but you let a lot pass - in the other you also used a foreach in a string, what’s strange to be working on your page.

  • @Andersoncarloswoss, it’s true, I edited the answer and I think it’s now good. I swear I didn’t know that $dados = file($arquivo_tmp); returned an array, which is why in the first answer I did so!

  • Yeah, it’s gotten a lot better :D

  • 1

    I loved the error message for the case of disagreement

Browser other questions tagged

You are not signed in. Login or sign up in order to post.