Capture div by class

Asked

Viewed 749 times

2

I am trying to capture a div from its class but did not succeed, notice: I try to capture the div with the class class='m-definicao-conteudo' of the site that I inform to Curl, but returns me this error:

Warning:
Domdocument::loadHTML(): Unexpected end tag : a in Entity, line: 102 in /Applications/XAMPP/xamppfiles/htdocs/teste.php on line 13

$ch = curl_init ("");
curl_setopt($ch, CURLOPT_URL, 'http://dicionarioinformal.com.br/aham/');
curl_setopt($ch, CURLOPT_USERAGENT, "Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912/870; U; id) Presto/2.4.15"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$html = curl_exec($ch);

$dom = new DOMDocument;
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$results = $xpath->query("//*[@class='m-definicao-conteudo']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}
  • It seems that the HTML is not in condition or is not being collected in full.

  • Strange because it is collected whole by Curl

2 answers

3

Change the USER_AGENT.

Alter:

curl_setopt($ch, CURLOPT_USERAGENT, "Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912/870; U; id) Presto/2.4.15");

To:

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"); 

You can change to any other as long as it is a Desktop browser.

Using the current (Opera Mini) website is redirecting to the 'mobile' model of the site, which does not contain the div. ;)

  • +1 Good, I had not even seen the AU in use :)

2

Your problem seems to me to be the way you’re passing the name of class to be located:

$html = '<div>Ora que raio!</div>
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<div class="minhaClasse">Encontrei</div>
<p></p>';

$dom = new DOMDocument;
$dom->loadHTML($html);

$class = 'minhaClasse'; // guarda nome da classe numa variavel

$procura = new DomXPath($dom); // instancia o DomXPath

$div = $procura->query("//*[contains(@class, '$class')]"); // Procura passando a variavel

See example in Ideone:

var_dump($div->item(0)->nodeValue); // string(9) "Encontrei"
  • I did as you showed me and returned the same errors, which problem of my Curl?

  • Hmm... hard to say without testing your code... but here it is 4:30 in the morning, I’m no longer in the head to see it at this hour. I’ll try to help more tomorrow!

  • Good night, see you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.