<?php
date_default_timezone_set('Asia/Tokyo');
ini_set('error_reporting', E_ALL & ~E_STRICT & ~E_DEPRECATED); // & ~E_NOTICE
ini_set('log_errors', true);
ini_set('html_errors', false);
ini_set('display_errors', true);
define('CHARSET', 'UTF-8');
ini_set('default_charset', CHARSET);
mb_http_output(CHARSET);
mb_internal_encoding(CHARSET);
mb_regex_encoding(CHARSET);
header('Content-Type: text/html; charset='.CHARSET);
/*
A parte que interessa começa aqui. O trecho acima é somente um bootstrap.
*/
$email = '[email protected]';
$url = 'https://www.facebook.com/search/all/?q='.$email;
$data = file_get_contents($url);
$data = html_entity_decode($data);
$data = str_replace(array('<!-- ', ' -->'), '', $data);
class Foo {
private $data;
private $dom;
public function __construct($data) {
$this->data = $data;
$this->dom = new DOMDocument();
$this->dom->validateOnParse = false;
$this->dom->preserveWhiteSpace = true;
}
public function htmlGetContentBySelector($query, $data = null) {
if (!empty($data)) {
$this->data = $data;
}
libxml_use_internal_errors(true);
@$this->dom->loadHTML($this->data);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($this->dom);
$xpath_resultset = $xpath->query($query);
return $this->dom->saveHTML($xpath_resultset->item(0));
}
}
$c = new Foo($data);
$query = "//code[@id='u_0_d']";
$rs = $c->htmlGetContentBySelector($query);
// O resultado integral
// Exibe o bloco inteiro
//echo $rs; exit;
/*
Agora vamos filtrar e extrair o que interessa
Aqui pegamos a foto.
*/
$query = "//img[@class='_fbBrowseXuiResult__profileImage img']";
$pic = $c->htmlGetContentBySelector($query, $rs);
echo $pic;
/*
retorno
<img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/FOTO-DO-PERFIL" width="100" height="100" alt="NOME DO PERFIL">
*/
/*
O nome e URL do perfil.
*/
$query = "//div[@class='_gll']";
$name = $c->htmlGetContentBySelector($query, $rs);
echo $name;
/*
<div class="_gll"><div><a href="https://www.facebook.com/pagina-da-pessoa"><div class="_5d-4"><div class="_5d-5">NOME DO PERFIL</div></div></a></div></div>
*/
/*
Empresa onde trabalha.
*/
$query = "//div[@class='_glm']";
$job = $c->htmlGetContentBySelector($query, $rs);
echo $job;
/*
<div class="_glm"><div class="_pac" data-bt="{" ct>å¤åå: <a href="https://www.facebook.com/pages/pagina-da-empresa">NOME DA EMPRESA</a><div class="_1my"></div>
</div></div>
*/
The results still have HTML formatting, however they are very easy to manipulate and extract the data if you want to remove the HTML from them.
The variable $rs
returns something like this:
string(1558) "<code id="u_0_d"><!-- <div class="_4-u2 _4-u8"><div id="all_search_results" data-bt="{"session_id":"5505924b49749c699b44850e32fe24fa","typeahead_sid":null,"result_type":"all","referrer":"","path":"\\/search\\/all\\/","experience_type":"simplepps"}"><div class="_1yt"><div class="_3u1 _gli _5und" data-bt="{"id":1251714145,"rank":null,"abtest_version":null,"abtest_params":[null],"section":"main_column","owner_id":null,"sub_id":null,"browse_location":null,"query_data":{"q":"email\\u0040que.deseja.buscar"},"is_headline":false}"><div class="_401d"><div class="clearfix"><a class="_fbBrowseXuiResult__profileImageLink _8o _8s lfloat _ohe" href="https://www.facebook.com/pagina.da.pessoa" aria-hidden="true" tabindex="-1"><img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/xxxxxx-FOTO-DA-PESSOa-xxxxx_n.jpg?oh=74ae0b9e2cc130f9800f98d35d64ce36&oe=58AAB17B" width="100" height="100" alt="wa wa" /></a><div class="_42ef"><div class="_glj"><div class="clearfix"><div class="_glk rfloat _ohf"></div><div class="_gll"><div><a href="https://www.facebook.com/pagina.da.pessoa"><div class="_5d-4"><div class="_5d-5">NOME DA PESSOA </div></div></a></div></div></div><div><div class="_glm"><div class="_pac" data-bt="{"ct":"sub_headers"}">Job: <a href="https://www.facebook.com/pages/página-empresa-onde-trabalha/codigo-qualquer">NOME DA EMPRESA ONDE TRABALHA</a><div class="_1my"></div></div></div><div class="_glo"></div></div><div class="_glp"></div><div class="_3t0c"></div></div></div></div></div></div></div></div></div> --></code>"
Note: The facebook URL will obviously not return profile data that is configured to hide the data.
I can’t tell if the result can return more than one profile. But considering that emails are unique to each profile, so we can risk extracting data like name, url and profile photo without worrying about it.
Also important that the values defined in the attributes class
and id
can change. The above script may fail to work properly due to this or also for any other reason, in the future, because it is a gambiarra and not an official and documented way.
Be aware that abnormal requests can result in blocking the IP you require. So use sparingly.
But will this only work if I log in? How will I log in and run this code...
– Luhhh
No need to authenticate. Just copy, run and see the return.
– Daniel Omine
It returns saying that the browser is not compatible. How can I set a user_agent in this example?
– Luhhh
http://stackoverflow.com/questions/2107759/php-file-get-contents-and-headers
– Daniel Omine