Get information from Twitter without using the Curl API

Asked

Viewed 227 times

0

I have the following code:

$url = 'https://twitter.com/' . $username;

$user = curl_init();
curl_setopt_array($user, [
      CURLOPT_URL             => $url,
      CURLOPT_CUSTOMREQUEST   => 'GET',
      CURLOPT_CAINFO          => 'cacert-2017-06-07.pem',
      CURLOPT_RETURNTRANSFER  => true,
      CURLOPT_SSL_VERIFYPEER  => false,
      CURLOPT_SSL_VERIFYHOST  => 2,
      CURLOPT_HTTPHEADER      => [
        "Content-type:text/html;charset=utf-8",
      ],
      CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
      CURLOPT_HEADER          => true,
      CURLOPT_FOLLOWLOCATION  => true,
      CURLOPT_MAXREDIRS       => 2,
      CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
      CURLOPT_POSTREDIR       => 2,
      CURLOPT_AUTOREFERER     => 1,
      CURLOPT_ENCODING        => "gzip"
  ]
);
$user_info = json_encode(curl_exec($user));
//$user_info = json_decode(curl_exec($user));

var_dump($user_info);
echo $user_info;

Well, it returns to me:

inserir a descrição da imagem aqui

I would like to extract information such as:

Screen_name, Name, Profile_img, etc

A friend of a site owner said it is possible, but he did not want to give his arm to twist and teach me, what is the logic behind? It is possible?

Monitoring the network got this:

-H "accept-encoding: gzip, deflate, br"
-H "accept-language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4"
-H "upgrade-insecure-requests: 1"
-H "user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
-H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
-H "cache-control: max-age=0"
-H "authority: twitter.com"
  • I’ve done everything I’m doing since yesterday trying to debug it, and I don’t understand this part "Any particular reason not to use Curl?"

  • 1

    I read the wrong title. My mistake. I had read "Get information without Curl". Sorry.

  • 1

    At a glance: https://answall.com/a/218866/3635

  • Oops, here’s settled, but I’ll read.

  • @Guilhermenascimento, I will open a question, about Curl and token can help me?

1 answer

1


The Twitter page apparently has a field input[type=hidden] with the data JSON, which makes our life much easier. The result obtained in:

$user_info = curl_exec($user);

It is nothing more than the HTTP response obtained when the request is made. To get only the body of the response, that is, the HTML code, just do:

$header_size = curl_getinfo($user, CURLINFO_HEADER_SIZE);
$header = substr($user_info, 0, $header_size);
$body = substr($user_info, $header_size);

Thus, $header will be the HTTP response headers and $body the HTML code. To parse this code, we use the native class DOMDocument (never use regex):

$dom = new DOMDocument();
@$dom->loadHTML($body);

The @ in the second line is to hide warning messages generated due to errors in the HTML of the Twitter page (several elements with same id). The above mentioned field that has JSON is:

<input type="hidden" id="init-data" class="json-data" value="..." />

So we just search for the id init-data in the GIFT:

$json = $dom->getElementById("init-data")->getAttribute("value");

Thus, we use json_decode to convert to an object:

$data = json_decode($json);

And we can access the desired information:

echo "Nome: ", $data->profile_user->name, PHP_EOL;
echo "Usuário: ", $data->profile_user->screen_name, PHP_EOL;
echo "Foto de perfil: ", $data->profile_user->profile_image_url, PHP_EOL;

In my case, the exit was:

Nome: Anderson Carlos Woss
Usuário: acwoss
Foto de perfil: http://pbs.twimg.com/profile_images/827606791592747008/9EdeoXRp_normal.jpg

The whole code would be something like:

<?php

$url = 'https://twitter.com/' . $username;

$user = curl_init();
curl_setopt_array($user, [
      CURLOPT_URL             => $url,
      CURLOPT_CUSTOMREQUEST   => 'GET',
      CURLOPT_CAINFO          => 'cacert-2017-06-07.pem',
      CURLOPT_RETURNTRANSFER  => true,
      CURLOPT_SSL_VERIFYPEER  => false,
      CURLOPT_SSL_VERIFYHOST  => 2,
      CURLOPT_HTTPHEADER      => [
        "Content-type:text/html;charset=utf-8",
      ],
      CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
      CURLOPT_HEADER          => true,
      CURLOPT_FOLLOWLOCATION  => true,
      CURLOPT_MAXREDIRS       => 2,
      CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
      CURLOPT_POSTREDIR       => 2,
      CURLOPT_AUTOREFERER     => 1,
      CURLOPT_ENCODING        => "gzip"
  ]
);

$user_info = curl_exec($user);

$header_size = curl_getinfo($user, CURLINFO_HEADER_SIZE);
$header = substr($user_info, 0, $header_size);
$body = substr($user_info, $header_size);

$dom = new DOMDocument();
@$dom->loadHTML($body);

$json = $dom->getElementById("init-data")->getAttribute("value");
$data = json_decode($json);

echo "Nome: ", $data->profile_user->name, PHP_EOL;
echo "Usuário: ", $data->profile_user->screen_name, PHP_EOL;
echo "Foto de perfil: ", $data->profile_user->profile_image_url, PHP_EOL;
  • Dude, show!!! It worked perfectly, in addition to the most super well explained, but, PHP_EOL is really necessary?

  • 1

    No, I only used to break the line and display each information on different lines. How they will be used will depend on your application.

  • I got it, man I just didn’t get the part about input[type=hidden] where do I find it? I think I’ll open another question ja, well elaborated on how to get information from Twitter with Curl

  • 1

    Looking at the source code of the Twitter page and looking for my name, I found this element on line 21801 of HTML. It is one of the last elements on the page.

  • Ata I’ll take a look at that debug obg @Anderson Carlos Woss

  • How do you debug the indices of $data->user_info?? to see the contents?

  • 1

    Who would be $data->user_info? Or would it be $data->profile_user?

  • I’m sorry I didn’t even realize...

  • 1

    Can do var_dump($data->profile_user).

  • It worked right, I’m returning with echo json_encode($data->profile_user) to use with ajax, the problem will be to recover after

  • A question friend, as I am saving all data, cookies, that comes from Twitter and saving in a DB, and in a file .txt In this debugged html, is it possible to find something about self following, and receiving followers? in case all ids saved in my database, for example logged, and I will receive all followers with the ids saved in the bank... there I find something like that? @Anderson

Show 6 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.