Get external links with PHP Curl

Asked

Viewed 603 times

0

I have this code that returns me to the Google page.

<?php

$request = curl_init();
curl_setopt_array($request, [
    CURLOPT_URL             => 'https://www.google.com',
    CURLOPT_RETURNTRANSFER  => true,
    CURLOPT_SSL_VERIFYPEER  => false,
]);
$response = curl_exec($request);
curl_close($request);

echo $response;

However, it does not bring me external links as image among others. Note that instead of it bring me google.com/... it brings the name of my vhost viperfollowdev.com, see the image below to understand.

Is there any way to fix this?

inserir a descrição da imagem aqui

My second example was:

<?php

$request = curl_init();

curl_setopt_array($request, array(
    CURLOPT_URL                         => 'https://www.instagram.com',
    CURLOPT_RETURNTRANSFER  => true,
    CURLOPT_FOLLOWLOCATION  => true,
    CURLOPT_SSL_VERIFYPEER  => false,
));

$response = curl_exec($request);
curl_close($request);

$response = str_replace('/static/bundles/', 'https://www.instagram.com/static/bundles/', $response);
$response = str_replace('/static/images/', 'https://www.instagram.com/static/images/', $response);
$response = str_replace('/data/manifest.json', 'https://www.instagram.com/data/manifest.json', $response);

echo $response;

It’s picking up but it doesn’t show yet on my page. I went to url entire but not working.

2 answers

1


About the error

This is because Google does not use the full file link in the attributes src, srcset etc. Instead, it uses only the path of the file: Ex: /path/to/image.png

With this the browser will always search for these images on the website accessed, in your case, http://viperfollowdev.com.

Solutions

To fix this, just add the code below when printing the variable $response.

echo '<base href="https://www.google.com/" />';

But this solution will not work in all cases. When you have already set the base url (as in the code above) in your html, the browser will ignore the new "base url".

In that case, only the regex to solve your case (or at least part of it).

Regex

(src=|href=|srcset=|url)('|"|\()(\/.*?)('|"|\))

To regex above will capture all attribute values src, srcset, href and url. The latter to css.

Now just use the function preg_replace to replace the values.

Example:

<!DOCTYPE hml>
<html>
    <head>
        <title>Title of the document</title>
        <base href="https://www.bing.com.br/" />
    </head>

    <body>

        <?php

            $url = "https://www.google.com";

            $request = curl_init($url);

            curl_setopt_array($request, [
                CURLOPT_RETURNTRANSFER  => true,
                CURLOPT_AUTOREFERER  => true,
                CURLOPT_SSL_VERIFYPEER  => false,
            ]);
            $response = curl_exec($request);
            curl_close($request);

            echo preg_replace("/(src=|href=|srcset=|url)('|\"|\()(\/.*?)('|\"|\))/", "$1$2{$url}$3$4", $response);

        ?>

        <script src="/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
    </body>
</html>

To regex may vary from site to site. Depending on what you want to change, it is necessary to customize the regex and make it more complete, but the principle is the same.

  • Thank you, it worked, look at my second example in my almost similar question, only its more organized.

0

In the first case what happens is that you got the html code from the request and the resources are declared for virtual paths that do not exist in your application. Something you may have already noticed by replacing it in your second attempt. But there are other validations and procedures that run on the host and prevent you from presenting their content. (Sessions, Tokens, Headers and etc.)

I advise that if you want to consume or display the content of other websites and services, you should restrict yourself to your Apis, rules and terms of use.

Take a look at Google API and Instagram API, maybe what you seek to do is even supported by the two platforms.

  • I agree, "should be restricted to your Apis, rules and terms of use." is for studies see that in the first attempt was with google.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.