How to separate obtained Loadhtml results

Question

How to separate obtained Loadhtml results

Asked 5 years, 3 months ago

Viewed 149 times

0

Well, I have the following code:

<?php
$url = 'https://www.zerozero.pt/edicao.php?id_edicao=135716';
$str = '';

$html = file_get_contents($url);

//debug purposes
//$html = '<div id="pagina">foo</div>';

$doc = new DOMDocument();
$doc->strictErrorChecking = false;
@$doc->loadHTML( $html );

$div = $doc->getElementById( 'edition_table' );



    $str = $div->nodeValue;


echo $str;

?>

This code takes all data from the site’s classification table: https://www.zerozero.pt/edicao.php?id_edicao=135716

And return the following:

It returns me all the values, but I intended to separate each value obtained by a space. That is to separate each value he pulls to get deferential:

Example: P J V E D GM GS DG 1 National 50 24 etc...

I wanted to make my code return like this.

How can I do?

1

IF you’re already using DOM why use EXPLODE? Use DOM to get the TR and then to get the TD.

– Guilherme Nascimento

2020/04/01 at 02:34
1

Another thing, do not use error_reporting with 0, it is not to hide errors, by the way nor errors should appear, if it appears is because there is something wrong, I recommend you read: Why use error_reporting with display_errors and display_startup_errors?

– Guilherme Nascimento

2020/04/01 at 02:37
As it would be in the case with the DOM?

– Gonçalo

2020/04/01 at 02:37
Use https://www.php.net/manual/domdocument.getelementsbytagname.php or learn Xpath, ps: Without wanting to jab, on my FW I created a CSS Selector Xpath Converter to use with PHP DOM: https://github.com/inphinit/inphinit/wiki/QuerySelector-%28selectors-CSS%29-com-PHP#reading-a-p%C3%A1gina-external, of course you’ve already started a project in pure php or another framework, but it’s just an indication.

– Guilherme Nascimento

2020/04/01 at 02:40
I will edit the question, I did it only with the DOM, but I still can not do what I want...

– Gonçalo

2020/04/01 at 19:50
Test your code over and over again and you will see that the return is empty. It seems to me that the website you are trying to pull the data from does reCaptcha checking. This means that without reCaptcha validation, the table is not loaded by getElementById( 'edition_table' ).

– Sam

2020/04/03 at 02:47
Or he uses cookie. No cookie he tries to do the reCaptcha check to avoid that which you are trying to do.

– Sam

2020/04/03 at 02:51
I tested the code about 50x in a row and I didn’t get that problem, I don’t know if it has to do with the location...

– Gonçalo

2020/04/03 at 14:56

Show 3 more comments

2 answers

3

You can solve this problem using the Domdocument in this way, use getElementsByTagName to fetch the tag table,th and td:

<?php
$url = 'https://www.zerozero.pt/edicao.php?id_edicao=135716';


$Array = [];
$html = file_get_contents($url);
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
@$doc->loadHTML( $html );

$div = $doc->getElementById( 'edition_table' );

$table = $div->getElementsByTagName('table');

$ths = $table[0]->getElementsByTagName('th');
foreach($ths as $th):
  $Array[] = $th->nodeValue;
endforeach;

$tds = $table[0]->getElementsByTagName('td');
foreach($tds as $td):
  $Array[] = $td->nodeValue;
endforeach;

print_r($Array);

Follows the repl of it working: https://repl.it/@Kleberoliveira/How to separate results-obtained-Loadhtml

Perfect. That’s just what I needed! Thank you very much.

– Gonçalo

2020/04/05 at 00:38

Browser other questions tagged php dom

You are not signed in. Login or sign up in order to post.

by Bruno Henrique • 52 points · Answer 1 · 2020-04-03T11:56:06+00:00

Good morning Gonçalo!

A Friend here at the Site asked a similar question, maybe it solves your problem...

The following solution has been proposed:

function tdrows($elements)
{
    $str = "";
    foreach ($elements as $element) {
        $str .= $element->nodeValue . ", ";
    }

    return $str;
}

function getdata()
{
    $contents = "<table><tr><td>Row 1 Column 1</td><td>Row 1 Column 2</td></tr><tr><td>Row 2 Column 1</td><td>Row 2 Column 2</td></tr></table>";
    $DOM = new DOMDocument;
    $DOM->loadHTML($contents);

    $items = $DOM->getElementsByTagName('tr');

    foreach ($items as $node) {
        echo tdrows($node->childNodes) . "<br />";
    }
}

getdata();

Reference: HTML Table to PHP Array

Hug.