-1
Url2 works and can extract the data, Url1 does not.
<?php
include "simple_html_dom.php";
$CARDGALGO = file_get_html("$URLX");
echo $CARDGALGO;
?>
-1
Url2 works and can extract the data, Url1 does not.
<?php
include "simple_html_dom.php";
$CARDGALGO = file_get_html("$URLX");
echo $CARDGALGO;
?>
1
I debugged the script and noticed that URL1 passes the limit of MAX_FILE_SIZE
, which is currently 600000, see simple_html_dom.php
line 66:
define('MAX_FILE_SIZE', 600000);
Then you can increase this limit or you can stop using extra libs and use the native PHP API:
Example:
<?php
$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";
$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);
To catch a specific element you can use:
Grabbing the text of a specific element by ID:
<?php
$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";
$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);
echo 'Texto:', $doc->getElementById('logo')->textContent, '<br>';
This example takes this part of the current page:
<header id="header" role="banner"> <div class="hix"> <a href="greyhounds" id="logo">Ladbrokes</a> <div id="nav-mobile-open"></div> </div> </header>
To take all elements of a type, like all links, would be something like:
<?php
$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";
$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);
foreach ($doc->getElementsByTagName('a') as $node) {
echo 'Texto:', $node->textContent, '<br>';
}
But surely the most practical way to catch specific elementros is to use Xpath, as on this page the column "4" of each row in the table represents the name of the coach so the Xpath to be used would be something like:
//tr/td[4]
Example:
<?php
$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";
$doc = new DOMDocument;
$doc->loadHTMLFile($URL1);
$xpath = new DOMXpath($doc);
$colunas = $xpath->query("//tr/td[4]");
echo 'Treinadores:<br>';
foreach ($colunas as $node) {
$nome = trim($node->textContent);
echo ' - ', $nome, '<br>';
}
These links you have added have many HTML errors, which can emit many warnings, so to avoid this being displayed you can simply turn on and delisgar the internal errors of the API, thus:
<?php
$URL1 = "http://ladbrokes.365dm.com/greyhounds/profile/dog/oor-millie/3334094";
$doc = new DOMDocument;
$estadoOriginal = libxml_use_internal_errors(true);
$doc->loadHTMLFile($URL1);
libxml_clear_errors();
libxml_use_internal_errors($estadoOriginal);
Browser other questions tagged php html dom web-scraping
You are not signed in. Login or sign up in order to post.
Impressive. Thank you very much. At first as the code is already all written using Simple I will only change the MAX_FILE_SIZE but next update I will rewrite using Domxpath. Thank you for speedily solving my problem.
– Belks