filesize for files larger than 2GB on x86 platforms

Asked

Viewed 315 times

13

I was reading the PHP documentation and I noticed this information:

Note: Because PHP’s integer type is Signed and Many Platforms use 32bit integers, some filesystem functions may Return Unexpected Results for files which are Larger than 2GB.

When accessing the documentation in Portuguese I noticed this:

Note: Since the whole type of PHP is flagged and many platforms use 32-bit integers, filesize() can return unexpected results for files larger than 2 Gb. For files between 2 Gb and 4 Gb you can solve this problem using sprintf("%u", filesize($file)).

They give a hint of the use of sprintf, however I found this question:

Apparently they tried several methods. I don’t know if it was any Portuguese documentation contributor who added this code:

sprintf("%u", filesize($file))

What I would like to know is if he has any problem (since it seems that only the documentation staff in Portuguese thought about it). For example:

  • He fails in some particular situation?
  • It has no accuracy as to the actual weight of the file?
  • Or the code actually works to convert the weight into integers to a numerical string?
  • The problem of using Curl is if the file is 4gb the server consciously delay

  • @FABIOMATEUS even defining the curl_setopt($ch, CURLOPT_NOBODY, 1); it takes time? If it takes time yet yes, then use the first solution, with stat -c and for %F in which I have demonstrated in https://answall.com/a/183202/3635

2 answers

6

It seems that, the problem occurs due to signaling that the PHP imposes on those of the type whole, and many platforms use 32 bits, reason why the filesize() sometimes returns unexpected results for files larger than 2GB.

As for the explanation of why this expression is more appropriate, it is a bit complicated to answer, since several users have tried in various ways to write their own and even more complex methods, to get the actual size of a file.

It prints the result of filesize as UNSIGNED INT so it can be until 4GB. The Reason is, SIGNED INT runs until 2GB and flips to -2GB watch following:

Translation: This prints the result of _"filesize" as "unsigned whole", so it can be up to 4GB. It happens because, "signed whole" run up to 2GB and turn to -2GB, see:

file<2GB      = SIGNED:  1048576512 UNSIGNED: 1048576512
file>2GB      = SIGNED: -2100140103 UNSIGNED: 2194827193
file>4GB      = SIGNED:  -100662784 UNSIGNED: 4194304512

This text quoted above was taken from any directory of PHP, in it the user explains why of the function. However it does not say if it is the most indicated or not.

In my view, it is very likely that this expression is being used because it returns negative values for files between 2GB and 4GB, which can still be corrected with some calculation, and returns a definite and incorrigible value for files above 4GB. In fact it was kind of alarming the example being only in the document note in Portuguese, but, the example already existed in the contribution notes.

On the page of PHP usually the examples we find there are the simplest, it does not mean that it is the only way to get the actual size of a file. It’s very likely that this will require some testing on your part, because you don’t find much information about why you use the sprintf.

He fails in some particular situation?

Some users reported that there were failures on some x86 architecture-based systems, and some problems reported on x64 systems, so it is very likely that there are still some errors. Even if you fail, you will return one E_WARNING or simply FALSE.

It has no accuracy as to the actual weight of the file?

The accuracy is good, returns the actual size in bytes.

Or the code actually works to convert the weight into integers to a numerical string?

Yes, it works, this is the return I got in the last result:

$file = "ficheiro.zip";
var_dump(sprintf("%u", filesize($file)));

Retorno: string(4) "5209" (5.08KB)
Retorno: string(10) "2092964971" (1.94GB)

There are several examples available on how to get the actual file size for the most diverse platforms, some are even based on shell, All you have to do is look for what suits you best. If you need more details, I believe the only solution will be to run isolated tests, and dig deeper.

Good luck.


References:

PHP.cz

PHP.tw

PHP.edu

Drupal.org

PHP.net

  • NB - in Brazilian Portuguese it is said "integer with sign" and "integer without sign", and not "signed" and "unsigned" - as the words "signed" exist for another context, also as translation of "Signed" there may be some confusion in its translation. (I got confused :-p)

3


It seems that the problem extends beyond and despite the report of Edilson, I noticed that it is not in every environment or version of PHP that this will work well and not necessarily accurate.

On an x64 system a file larger than 4GB returned a positive value, but it was not the file size, that is to say didn’t work:

 sprintf("%u", filesize($file))

Even though it is in an x64 environment and PHP5 is being compiled for x64, it still won’t be 100% x64, actually it is x86_x64 in Windows (in PHP7 things worked a little better).

The problem is not well with PHP necessarily, but it is due to PHP5 working with 32bit and even the 64bit will have a limitation, so what I needed was something that works well almost independently of the environment, I do not need to make calculations with the value, I just needed to know the size of a file, I came to these solutions:

Software native to the system

This solution will depend on the stat be available on Linux servers and Mac OSX and BSD for example, I do not know if it is something that works on all platforms, for Windows I used this Soen

Something like:

  • Unix-like: stat -c %s arquivopesado.7z (there are variations of this command for different types of systems Unix-like, including for Mac, ie would have to adjust the command)

  • Windows: for %F in ("arquivopesado.7z") do @echo %~zF

The script went like this:

<?php
function filesizealternativo($arquivo)
{
    if (is_file($arquivo) === false) {
        return false;
    }

    $arqarg = escapeshellarg(realpath($arquivo));

    if (strcasecmp(substr(PHP_OS, 0, 3), 'WIN') === 0) {
        $command = 'for %F in (' . $arqarg . ') do @echo %~zF';
    } else {
        $command = 'stat -c %s ' . $arqarg;
    }

    $resposta = shell_exec($command);

    if ($resposta === null) {
        return false;
    }

    $resposta = trim($resposta);

    if (is_numeric($resposta)) {
        return $resposta;
    }

    return false;
}

$a = filesizealternativo('arquivogrande.7z');

var_dump($a);

Using the file:/// protocol with PHP

The problem of using stat is the compatibility of some servers and dependencies, there are also some servers that block functions shell_exec, exec, system, etc, then I performed a test with CURL and file:// (http://php.net/manual/en/wrappers.file.php), the result was well functional:

function filesizealternativo2($arquivo)
{
    if (is_file($arquivo) === false) {
        return false;
    }

    $arquivo = realpath(preg_replace('#^file:#', '', $arquivo));

    $ch = curl_init('file://' . ltrim($arquivo, '/'));

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Faz o retorno ser salvo na variável
    curl_setopt($ch, CURLOPT_HEADER, 1); //Faz retornar os headers
    curl_setopt($ch, CURLOPT_NOBODY, 1); //Evita retornar o corpo

    $headers = curl_exec($ch);
    curl_close($ch);

    $ch = null;

    //Com preg_match extraímos o tamanho retornado de Content-Length
    if (preg_match('#(^c|\sc)ontent\-length:(\s|)(\d+)#i', $headers, $matches) > 0) {
        return $matches[3];
    }

    return false;
}

$a = filesizealternativo2('arquivogrande.7z');

var_dump($a);

This way the only dependency will be the Curl extension, which is usually already enabled on many servers.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.