Extract data from a facebook profile by searching for email

Asked

Viewed 559 times

3

I need to check if there is profile on Facebook, passing as parameter the email.

I noticed that the API has no way.

But the facebook site has the URL:

https://www.facebook.com/search/all/? q=@

Instead of @ I put a valid email and it finds the profile.

The doubt would be:

1 - How can I via file_get_contents access this URL dynamically via PHP, and get the profile name and photo.

Note that accessing via browser, and putting there a valid email, it shows name, photo etc of the profile.

Thank you

1 answer

1

<?php
date_default_timezone_set('Asia/Tokyo');

ini_set('error_reporting', E_ALL & ~E_STRICT & ~E_DEPRECATED); // & ~E_NOTICE
ini_set('log_errors', true);
ini_set('html_errors', false);
ini_set('display_errors', true);

define('CHARSET', 'UTF-8');

ini_set('default_charset', CHARSET);
mb_http_output(CHARSET);
mb_internal_encoding(CHARSET);
mb_regex_encoding(CHARSET);

header('Content-Type: text/html; charset='.CHARSET);


/*
A parte que interessa começa aqui. O trecho acima é somente um bootstrap.
*/

$email = '[email protected]';
$url = 'https://www.facebook.com/search/all/?q='.$email;
$data = file_get_contents($url);
$data = html_entity_decode($data);
$data = str_replace(array('<!-- ', ' -->'), '', $data);

class Foo {

    private $data;
    private $dom;

    public function __construct($data) {
        $this->data = $data;
        $this->dom = new DOMDocument();
        $this->dom->validateOnParse = false;
        $this->dom->preserveWhiteSpace = true;
    }

    public function htmlGetContentBySelector($query, $data = null) {
        if (!empty($data)) {
            $this->data = $data;
        }
        libxml_use_internal_errors(true);
        @$this->dom->loadHTML($this->data);
        libxml_use_internal_errors(false);
        $xpath = new DOMXPath($this->dom);
        $xpath_resultset = $xpath->query($query);
        return $this->dom->saveHTML($xpath_resultset->item(0));
    }
}

$c = new Foo($data);

$query = "//code[@id='u_0_d']";
$rs = $c->htmlGetContentBySelector($query);
// O resultado integral
// Exibe o bloco inteiro
//echo $rs; exit;

/*
Agora vamos filtrar e extrair o que interessa

Aqui pegamos a foto.
*/
$query = "//img[@class='_fbBrowseXuiResult__profileImage img']";
$pic = $c->htmlGetContentBySelector($query, $rs);
echo $pic;

/*
retorno
<img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/FOTO-DO-PERFIL" width="100" height="100" alt="NOME DO PERFIL">
*/

/*
O nome e URL do perfil.
*/
$query = "//div[@class='_gll']";
$name = $c->htmlGetContentBySelector($query, $rs);
echo $name;

/*
<div class="_gll"><div><a href="https://www.facebook.com/pagina-da-pessoa"><div class="_5d-4"><div class="_5d-5">NOME DO PERFIL</div></div></a></div></div>
*/

/*
Empresa onde trabalha.
*/
$query = "//div[@class='_glm']";
$job = $c->htmlGetContentBySelector($query, $rs);
echo $job;

/*
     <div class="_glm"><div class="_pac" data-bt="{" ct>å¤åå: <a href="https://www.facebook.com/pages/pagina-da-empresa">NOME DA EMPRESA</a><div class="_1my"></div>
</div></div>
     */

The results still have HTML formatting, however they are very easy to manipulate and extract the data if you want to remove the HTML from them.

The variable $rs returns something like this:

string(1558) "<code id="u_0_d"><!-- <div class="_4-u2 _4-u8"><div id="all_search_results" data-bt="{"session_id":"5505924b49749c699b44850e32fe24fa","typeahead_sid":null,"result_type":"all","referrer":"","path":"\\/search\\/all\\/","experience_type":"simplepps"}"><div class="_1yt"><div class="_3u1 _gli _5und" data-bt="{"id":1251714145,"rank":null,"abtest_version":null,"abtest_params":[null],"section":"main_column","owner_id":null,"sub_id":null,"browse_location":null,"query_data":{"q":"email\\u0040que.deseja.buscar"},"is_headline":false}"><div class="_401d"><div class="clearfix"><a class="_fbBrowseXuiResult__profileImageLink _8o _8s lfloat _ohe" href="https://www.facebook.com/pagina.da.pessoa" aria-hidden="true" tabindex="-1"><img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/xxxxxx-FOTO-DA-PESSOa-xxxxx_n.jpg?oh=74ae0b9e2cc130f9800f98d35d64ce36&oe=58AAB17B" width="100" height="100" alt="wa wa" /></a><div class="_42ef"><div class="_glj"><div class="clearfix"><div class="_glk rfloat _ohf"></div><div class="_gll"><div><a href="https://www.facebook.com/pagina.da.pessoa"><div class="_5d-4"><div class="_5d-5">NOME DA PESSOA   </div></div></a></div></div></div><div><div class="_glm"><div class="_pac" data-bt="{"ct":"sub_headers"}">Job: <a href="https://www.facebook.com/pages/página-empresa-onde-trabalha/codigo-qualquer">NOME DA EMPRESA ONDE TRABALHA</a><div class="_1my"></div></div></div><div class="_glo"></div></div><div class="_glp"></div><div class="_3t0c"></div></div></div></div></div></div></div></div></div> --></code>"

Note: The facebook URL will obviously not return profile data that is configured to hide the data.

I can’t tell if the result can return more than one profile. But considering that emails are unique to each profile, so we can risk extracting data like name, url and profile photo without worrying about it.

Also important that the values defined in the attributes class and id can change. The above script may fail to work properly due to this or also for any other reason, in the future, because it is a gambiarra and not an official and documented way.

Be aware that abnormal requests can result in blocking the IP you require. So use sparingly.

  • But will this only work if I log in? How will I log in and run this code...

  • No need to authenticate. Just copy, run and see the return.

  • It returns saying that the browser is not compatible. How can I set a user_agent in this example?

  • http://stackoverflow.com/questions/2107759/php-file-get-contents-and-headers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.