Simple_html_dom what is the difference between the two Urls?

Asked

Viewed 88 times

-1

Url2 works and can extract the data, Url1 does not.

<?php 

include "simple_html_dom.php";
$CARDGALGO = file_get_html("$URLX");

echo $CARDGALGO;

?>

1 answer

1


I debugged the script and noticed that URL1 passes the limit of MAX_FILE_SIZE, which is currently 600000, see simple_html_dom.php line 66:

 define('MAX_FILE_SIZE', 600000);

Then you can increase this limit or you can stop using extra libs and use the native PHP API:

Example:

<?php

$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";

$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);

To catch a specific element you can use:

Grabbing the text of a specific element by ID:

<?php

$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";

$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);

echo 'Texto:', $doc->getElementById('logo')->textContent, '<br>';

This example takes this part of the current page:

<header id="header" role="banner">
    <div class="hix">
        <a href="greyhounds" id="logo">Ladbrokes</a>
                <div id="nav-mobile-open"></div>
            </div>            
</header>

To take all elements of a type, like all links, would be something like:

<?php

$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";

$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);

foreach ($doc->getElementsByTagName('a') as $node) {
    echo 'Texto:', $node->textContent, '<br>';
}

Using Domxpath

But surely the most practical way to catch specific elementros is to use Xpath, as on this page the column "4" of each row in the table represents the name of the coach so the Xpath to be used would be something like:

//tr/td[4]

Example:

<?php

$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";

$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);

$xpath = new DOMXpath($doc);

$colunas = $xpath->query("//tr/td[4]");

echo 'Treinadores:<br>';

foreach ($colunas as $node) {
    $nome = trim($node->textContent);
    echo ' - ', $nome, '<br>';
}

Avoiding warnings/warnings because of HTML errors on a page

These links you have added have many HTML errors, which can emit many warnings, so to avoid this being displayed you can simply turn on and delisgar the internal errors of the API, thus:

<?php

$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";

$doc = new DOMDocument;

$estadoOriginal = libxml_use_internal_errors(true);

$doc->loadHTMLFile($URL1);

libxml_clear_errors();

libxml_use_internal_errors($estadoOriginal);
  • 1

    Impressive. Thank you very much. At first as the code is already all written using Simple I will only change the MAX_FILE_SIZE but next update I will rewrite using Domxpath. Thank you for speedily solving my problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.