Problems with file_get_contents and Domdocument

Asked

Viewed 550 times

0

I’m trying to download the content of a site, but is giving this Warning:

`Domdocument::loadHTML(): Unexpected end tag : tr in Entity

And it’s spelling out several lines. I’m also not getting the accent fixed.

Someone could help me understand and solve these problems?

$content = http_build_query([
    'Local' => 'Adamantina',
    'Inicio' => '01/01/2015',
    'Final' => '31/12/2015',
]);

$context = stream_context_create([
    'http' => [
        'method'  => 'POST',
        'header'  => 'Content-type: application/x-www-form-urlencoded',
        'content' => $content,
    ]
]);

$contents = utf8_decode(file_get_contents('http://www.ciiagro.sp.gov.br/ciiagroonline/Listagens/BH/LBalancoHidricoLocal.asp', false, $context));

$dom = new DOMDocument();
$dom->loadHTML($contents);
$dom->saveHTML($dom->documentElement);

$xpath = new DomXPath($dom);
$rows = $xpath->query('//table/tr[position()>0]');

foreach ($rows as $row) {
    $tds=$row->getElementsByTagName("td");   

    foreach ($tds as $td) {
        print($td->nodeValue);
        echo "<br>";
    }
}
  • thus: $rows = $xpath->query('//table');&#xA;$rows2 = $xpath->query('.//tr', $rows);&#xA;&#xA;foreach($rows2 as $row){&#xA; $tds=$row->getElementsByTagName("td"); &#xA; foreach($tds as $td){&#xA; print ($td->nodeValue);&#xA; echo "<br>";&#xA; }&#xA;} didn’t work

  • Yeah, that’s not the problem, see if the answer helps the question.

1 answer

1

Warning problem:

To fix the problem of Warning you must use the libxml_use_internal_errors(), in fact it will only hide the mistakes of libxml.

Use the following:

libxml_use_internal_errors(true);

Source: http://php.net/manual/en/function.libxml-use-internal-errors.php

Accent problem:

To fix the coding problem use the mb_convert_encoding(), this will convert to HTML, but remove the utf8_decode() previous!

Use the following:

mb_convert_encoding($td->nodeValue, 'HTML-ENTITIES', 'UTF-8');

Source: http://php.net/manual/en/function.mb-convert-encoding.php

Change to something similar to this:

$contents = file_get_contents('http://www.ciiagro.sp.gov.br/ciiagroonline/Listagens/BH/LBalancoHidricoLocal.asp', false, $context);

Removing the utf8_decode().

Note:

  1. The easiest way to know when to use HTML-ENTITIES, for me, is to know which presents the ? instead of <?> (with black background) or some combinations of "random" characters. Of course, this is just for me, that I keep kicking until it works.

  2. I think it’s best to use cURL instead of file_get_contents(),

I’m out of time, sorry, I’ll try to improve the answer soon.

  • I found that the site I’m trying to get the data has problems in the development, so that must be his problem. But thanks for the help.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.