Read large (80gb) php file

Asked

Viewed 611 times

2

I need to read an 80 giga file using PHP to include in a postgres database. The file is a txt with a column layout.

THE IDEA WOULD BE NOT TO DIVIDE IT, BECAUSE THE PROCESS WOULD BE UNVIABLE, SO I NEED TO READ THE FILE AT ONCE IF IT TAKES SEVERAL HOURS.

The problem here I believe is memory.

What is the best way, if possible, to read this file without having to increase memory? it is possible to read the file in pieces so that there is no memory overflow? what would be the function for that?

  • 1

    Don’t forget to change the configuration directives, the running time limit of the script in php.ini, disable safe mode, etc. I believe you know this.

  • @Fabianomonteiro yes.. I’m already taking into account the time limit and safe mode. Thank you!

1 answer

7

Of course I do. You don’t need to store all the file contents in memory if you can process each line separately, just have in memory only one line at a time.

$handle = fopen('arquivo.txt', 'r')

if ($handle) {
    while (!feof($handle)) {
        $row = fgets($handle);

        // Faz o que precisar com a linha do arquivo.
    }
}

The function fopen just sets a reading pointer to the file, no matter the file size; we check if there is still content to read from the file by checking if the pointer hasn’t reached the end of the file; we read one line at a time from the file with fgets.

Obviously, if you do a database operation for each line read from your file your process will become too slow due to the large amounts of simple insertions you will make. It is worth considering, for example, making an insertion in the database every N lines read from the file, requiring a little more memory to store the N lines, but it would make communication with the database simpler.

  • On the Desktop (Delphi/C#) we do exactly in this format, we still manage to improve using Threads, we create one for each line (we limit to 100, each one processes and does what has to be done).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.