Domdocument::loadHTML() Error

Asked

Viewed 556 times

2

Well I have the following controller where I get a twitter user’s information by username using DOMDocument::loadHTML()

public function info($user = false) {

        if ($user === false) {
            Url::redirect('get');
        }

        $url = 'https://twitter.com';

        $this->CurlTwitterUserInfo = new CurlTwitterUserInfo($url, $followLocation = true, $timeOut = 30, $maxRedirects = 4, $binaryTransfer = false, $includeHeader = true, $noBody = false);
        $this->CurlTwitterUserInfo->setCookieFileLocation(DOCROOT . 'cookies' . DS . $this->CurlTwitterAuth->_authUsername . '.txt');
        $this->CurlTwitterUserInfo->setUserAgent($_SERVER['HTTP_USER_AGENT']);
        $this->CurlTwitterUserInfo->makeCurlInfo($user);

        //var_dump($this->CurlTwitterUserInfo->getHttpStatus());
    }

And here is mine Helper:

<?php

class CurlTwitterUserInfo {

    protected $_userAgent;
    protected $_url;
    protected $_followLocation;
    protected $_timeOut;
    protected $_maxRedirects;
    protected $_cookie;
    protected $_cookieFileLocation = '';
    protected $_post;
    protected $_postFields;
    protected $_referer = '';
    protected $_session;
    protected $_webPage;
    protected $_includeHeader;
    protected $_noBody;
    protected $_status;
    protected $_binaryTransfer;

    public $_authentication = 0;
    public $_authUsername = '';
    public $_authPassword = '';

    public function __construct($url, $followLocation = true, $timeOut = 30, $maxRedirects = 4, $binaryTransfer = false, $includeHeader = false, $noBody = false) {

        $this->_url                         = $url;
        $this->_followLocation  = $followLocation;
        $this->_timeOut                 = $timeOut;
        $this->_maxRedirects        = $maxRedirects;
        $this->_noBody                  = $noBody;
        $this->_includeHeader   = $includeHeader;
        $this->_binaryTransfer  = $binaryTransfer;
    }

    public function useAuth($use) {
        $this->_authentication = 0;

        if ($use === true) {
            $this->_authentication = 1;
        }
    }

    public function setUsername($username) {
        $this->_authUsername = $username;
    }

    public function setPassword($password) {
        $this->_authPassword = $password;
    }

    public function setReferer($referer) {
        $this->_referer = $referer;
    }

    public function setCookieFileLocation($path) {
        $this->_cookieFileLocation = $path;
    }

    public function setPost($postFields) {
        $this->_post = true;
        $this->_postFields = $postFields;
    }

    public function setUserAgent($userAgent) {
        $this->_userAgent = $userAgent;
    }

    public function makeCurlInfo($user) {

        $get_user_info = curl_init();
        curl_setopt_array($get_user_info, [
                CURLOPT_URL                         => $this->_url . '/' . $user,
                CURLOPT_CUSTOMREQUEST       => 'GET',
                CURLOPT_RETURNTRANSFER  => true,
                CURLOPT_SSL_VERIFYPEER  => false,
                CURLOPT_SSL_VERIFYHOST  => 2,
                CURLOPT_HTTPHEADER      => [
            "Content-type:text/html;charset=utf-8",
        ],
                CURLOPT_FOLLOWLOCATION  => $this->_followLocation,
                CURLOPT_USERAGENT               => $this->_userAgent,
                CURLOPT_COOKIEFILE          => $this->_cookieFileLocation,
                CURLOPT_COOKIEJAR               => $this->_cookieFileLocation,
                CURLOPT_COOKIESESSION       => true,
                CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTP | CURLPROTO_HTTPS,
          CURLOPT_POSTREDIR       => 2,
          CURLOPT_AUTOREFERER     => 1,
          CURLOPT_MAXREDIRS             => $this->_maxRedirects,
          CURLOPT_ENCODING        => "gzip"
            ]
        );

        if ($this->_includeHeader) {
            curl_setopt_array($get_user_info, [CURLOPT_HEADER => true]);
        }

        $this->_webPage = curl_exec($get_user_info);

        $this->_status = curl_getinfo($get_user_info, CURLINFO_HEADER_SIZE);
        $header = substr($this->_webPage, 0, $this->_status);
        $body = substr($this->_webPage, $this->_status);

        $dom = new DOMDocument('5.0', 'utf-8');
        @$dom->loadHTML($body);

        $data = json_decode($dom->getElementById('init-data')->getAttribute('value'));

        return $data->profile_user;
    }

    public function getHttpStatus() {
        return $this->_status;
    }

    public function __tostring() {
        return $this->_webPage;
    }

}

But I get the following error:

Error on Aug 23, 2017 19:00PM - Domdocument::loadHTML(): ID content-main-Heading already defined in Entity, line: 1267 in C: wamp64 www mvc system helpers Curltwitteruserinfo.php on line 106

What’s the matter?

Line 106:

@$dom->loadHTML($body);
  • From what I’ve been reading, Domdocument considers a repeated id as <span id="nome" name="nome">, even if it is valid HTML. This seems to solve, at least by omitting the error: http://nl3.php.net/manualen/function.libxml-use-internal-errors.php. In this case you need to manually check for errors.

  • I saw this document. @bfavaretto, but I can’t solve it at all I’m trying to, but only now I decided to go to the OS.

1 answer

0


Following the @bfavaretto tip, try doing it this way:

Note: If you have already done this test, please post the result, ok?

//Insira esta linha acima da instância do DOMDocument
// -- habilitar o uso de erros internos
var_dump(libxml_use_internal_errors(true));

$dom = new DOMDocument('5.0', 'utf-8');

//Agora insira o load em um if para tratar os erros caso obtenha falha
if(!$dom->loadHTML($body)) {
    foreach (libxml_get_errors() as $error) {
        // imprima os erros para testar 
        var_dump($error);
    }

    libxml_clear_errors();
}

//Agora de continuidade no seu código
$data = json_decode($dom->getElementById('init-data')->getAttribute('value'));

return $data->profile_user;

I hope you get your friend!!

  • Boaaa I was so far trying.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.