Get information from Twitter without using the Curl API

Question

Get information from Twitter without using the Curl API

Asked 7 years, 10 months ago

Viewed 227 times

0

I have the following code:

$url = 'https://twitter.com/' . $username;

$user = curl_init();
curl_setopt_array($user, [
      CURLOPT_URL             => $url,
      CURLOPT_CUSTOMREQUEST   => 'GET',
      CURLOPT_CAINFO          => 'cacert-2017-06-07.pem',
      CURLOPT_RETURNTRANSFER  => true,
      CURLOPT_SSL_VERIFYPEER  => false,
      CURLOPT_SSL_VERIFYHOST  => 2,
      CURLOPT_HTTPHEADER      => [
        "Content-type:text/html;charset=utf-8",
      ],
      CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
      CURLOPT_HEADER          => true,
      CURLOPT_FOLLOWLOCATION  => true,
      CURLOPT_MAXREDIRS       => 2,
      CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
      CURLOPT_POSTREDIR       => 2,
      CURLOPT_AUTOREFERER     => 1,
      CURLOPT_ENCODING        => "gzip"
  ]
);
$user_info = json_encode(curl_exec($user));
//$user_info = json_decode(curl_exec($user));

var_dump($user_info);
echo $user_info;

Well, it returns to me:

I would like to extract information such as:

Screen_name, Name, Profile_img, etc

A friend of a site owner said it is possible, but he did not want to give his arm to twist and teach me, what is the logic behind? It is possible?

Monitoring the network got this:

-H "accept-encoding: gzip, deflate, br"
-H "accept-language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4"
-H "upgrade-insecure-requests: 1"
-H "user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
-H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
-H "cache-control: max-age=0"
-H "authority: twitter.com"

I’ve done everything I’m doing since yesterday trying to debug it, and I don’t understand this part "Any particular reason not to use Curl?"

– user76271

2017/07/02 at 23:03
1

I read the wrong title. My mistake. I had read "Get information without Curl". Sorry.

– Woss

2017/07/02 at 23:04
1

At a glance: https://answall.com/a/218866/3635

– Guilherme Nascimento

2017/07/07 at 21:09
Oops, here’s settled, but I’ll read.

– user76271

2017/07/08 at 09:44
@Guilhermenascimento, I will open a question, about Curl and token can help me?

– user76271

2017/07/08 at 09:49

1 answer

Browser other questions tagged php curl twitter

You are not signed in. Login or sign up in order to post.

by Woss • **73,416** points · Answer 1 · 2017-07-02T23:53:01+00:00

The Twitter page apparently has a field input[type=hidden] with the data JSON, which makes our life much easier. The result obtained in:

$user_info = curl_exec($user);

It is nothing more than the HTTP response obtained when the request is made. To get only the body of the response, that is, the HTML code, just do:

$header_size = curl_getinfo($user, CURLINFO_HEADER_SIZE);
$header = substr($user_info, 0, $header_size);
$body = substr($user_info, $header_size);

Thus, $header will be the HTTP response headers and $body the HTML code. To parse this code, we use the native class DOMDocument (never use regex):

$dom = new DOMDocument();
@$dom->loadHTML($body);

The @ in the second line is to hide warning messages generated due to errors in the HTML of the Twitter page (several elements with same id). The above mentioned field that has JSON is:

<input type="hidden" id="init-data" class="json-data" value="..." />

So we just search for the id init-data in the GIFT:

$json = $dom->getElementById("init-data")->getAttribute("value");

Thus, we use json_decode to convert to an object:

$data = json_decode($json);

And we can access the desired information:

echo "Nome: ", $data->profile_user->name, PHP_EOL;
echo "Usuário: ", $data->profile_user->screen_name, PHP_EOL;
echo "Foto de perfil: ", $data->profile_user->profile_image_url, PHP_EOL;

In my case, the exit was:

Nome: Anderson Carlos Woss
Usuário: acwoss
Foto de perfil: http://pbs.twimg.com/profile_images/827606791592747008/9EdeoXRp_normal.jpg

The whole code would be something like:

<?php

$url = 'https://twitter.com/' . $username;

$user = curl_init();
curl_setopt_array($user, [
      CURLOPT_URL             => $url,
      CURLOPT_CUSTOMREQUEST   => 'GET',
      CURLOPT_CAINFO          => 'cacert-2017-06-07.pem',
      CURLOPT_RETURNTRANSFER  => true,
      CURLOPT_SSL_VERIFYPEER  => false,
      CURLOPT_SSL_VERIFYHOST  => 2,
      CURLOPT_HTTPHEADER      => [
        "Content-type:text/html;charset=utf-8",
      ],
      CURLOPT_USERAGENT       => $_SERVER['HTTP_USER_AGENT'],
      CURLOPT_HEADER          => true,
      CURLOPT_FOLLOWLOCATION  => true,
      CURLOPT_MAXREDIRS       => 2,
      CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
      CURLOPT_POSTREDIR       => 2,
      CURLOPT_AUTOREFERER     => 1,
      CURLOPT_ENCODING        => "gzip"
  ]
);

$user_info = curl_exec($user);

$header_size = curl_getinfo($user, CURLINFO_HEADER_SIZE);
$header = substr($user_info, 0, $header_size);
$body = substr($user_info, $header_size);

$dom = new DOMDocument();
@$dom->loadHTML($body);

$json = $dom->getElementById("init-data")->getAttribute("value");
$data = json_decode($json);

echo "Nome: ", $data->profile_user->name, PHP_EOL;
echo "Usuário: ", $data->profile_user->screen_name, PHP_EOL;
echo "Foto de perfil: ", $data->profile_user->profile_image_url, PHP_EOL;