Return every page via Curl

Asked

Viewed 600 times

0

I need to return a page that is on text/html, but is coded with zlib, yes I tried to decode but no chance, since the function zlib_decode, is not documented so I did searches but all unsuccessful, see the return:

'HTTP/1.1 200 OK
Content-Type: text/html
X-Frame-Options: SAMEORIGIN
Vary: Cookie, Accept-Language, Accept-Encoding
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Content-Language: pt-br
Content-Encoding: gzip
Date: Wed, 03 Jan 2018 19:03:06 GMT
Strict-Transport-Security: max-age=86400
Set-Cookie: alguns cookies'... (length=5280)

And here is mine request

function challenge($url) {
    $getCSRF = getCSRF();

    $request = curl_init();
    curl_setopt_array($request, array(
        CURLOPT_URL                         => 'https://www.url.com/' . $url,
        CURLOPT_CUSTOMREQUEST       => 'GET',
        CURLOPT_HEADER                  => true,
        CURLOPT_RETURNTRANSFER  => true,
        CURLOPT_SSL_VERIFYHOST  => false,
        CURLOPT_SSL_VERIFYPEER  => false,
        CURLOPT_COOKIE                  => $getCSRF->cookies,
        CURLOPT_USERAGENT               => $_SERVER['HTTP_USER_AGENT'],
        CURLOPT_HTTPHEADER          => array(
            'accept-encoding:gzip, deflate, br',
            'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'accept-language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
        )
    ));
    $response = curl_exec($request);
    curl_close($request);

    return $response;
}

var_dump(challenge('challenge/id/Fn0C4GsZjg/'));
  • 1

    Wouldn’t it be simpler to change the accept-encoding? Maybe the server is compressing because you asked for it yourself.

  • @Bacco already suggested it to me and I can not do

1 answer

0


First make false the CURLOPT_HEADER, thus:

CURLOPT_HEADER => true,

'Cause if you’re interested in headers you can pick up using curl_getinfo and maybe split the string (or detect when the first time occurs two line breaks in a row), but this is another situation.

It is also necessary to detect redirects in order not to fall on empty pages, so you can use the:

CURLOPT_FOLLOWLOCATION  => true,

And then at last you can use the gzdecode, thus:

function challenge($url) {

    $ch = curl_init();
    $getCSRF = getCSRF();

    $request = curl_init();
    curl_setopt_array($request, array(
        CURLOPT_URL             => 'https://www.url.com/' . $url,
        CURLOPT_CUSTOMREQUEST   => 'GET',
        CURLOPT_HEADER          => false,
        CURLOPT_RETURNTRANSFER  => true,
        CURLOPT_SSL_VERIFYHOST  => false,
        CURLOPT_SSL_VERIFYPEER  => false,
        CURLOPT_FOLLOWLOCATION  => true,
        CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
        CURLOPT_HTTPHEADER      => array(
            'accept-encoding:gzip, deflate, br',
            'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'accept-language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
        )
    ));

    $response = curl_exec($request);

    $resposta_http = curl_getinfo($request, CURLINFO_HTTP_CODE);

    //Qualquer código fora do range 200 e 299 provavelmente é pagina de erro
    if ($resposta_http < 200 && $resposta_http > 299) {
        $response = null;
    }

    curl_close($request);

    //Decodifica
    return $response ? gzdecode($response) : false;
}

var_dump(challenge('challenge/id/Fn0C4GsZjg/'));

Of course the above example is if it is Gzip, if it is deflate you will probably have to use the gzinflate (I’ll prepare a more elaborate example)

Another situation you can resolve is simply not sending the accept-encoding as @Bacco said, so:

curl_setopt_array($request, array(
    CURLOPT_URL             => 'https://www.url.com/' . $url,
    CURLOPT_CUSTOMREQUEST   => 'GET',
    CURLOPT_HEADER          => false,
    CURLOPT_RETURNTRANSFER  => true,
    CURLOPT_SSL_VERIFYHOST  => false,
    CURLOPT_SSL_VERIFYPEER  => false,
    CURLOPT_FOLLOWLOCATION  => true,
    CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
    CURLOPT_HTTPHEADER      => array(
        'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'accept-language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
    )
));

If the page is lightweight it won’t even be necessary to compress at the time of download.

  • 1

    Thank you. it worked and I have no doubt. Thank you.

  • Next brother was working I returned $Sponse and gave an echo in the function now is not catching more

  • 1

    @William with var_dump is accusing FALSE or NULL?

  • nothing he returns me nothing

  • @William as so nothing, then returns something like string(0) ""? because if it is an empty string it will return this, if it is null return NULL, if false will return bool(false), that is something in var_dump has to return, if not displaying var_dump is because you did something wrong in the script, may be giving error 500, may be a syntax error.

  • Ta all ok ta white even aoksoas but to trying here if it worked the first time always have to work kkk'

  • @Guilherme think that you are making confusion, there is no way var_dump return blank, must be another error, if not displaying the result of var_dump is that it should not even be arriving in var_dump, do the following post your whole code at http://Pastebin.com and tell me that I will come to see you, it must be some comma or } that you forgot.

  • managed, however I want you to show me the page, I give an echo in the function and in the view-source of google Chrome there is the code of the page but without being in the viewsource it does not appear anything. alias has yes but does not show the page

  • @William put the code in Pastebin, because the problem seems to be something else unrelated.

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.