How to separate obtained Loadhtml results

Asked

Viewed 149 times

0

Well, I have the following code:

<?php
$url = 'https://www.zerozero.pt/edicao.php?id_edicao=135716';
$str = '';

$html = file_get_contents($url);

//debug purposes
//$html = '<div id="pagina">foo</div>';

$doc = new DOMDocument();
$doc->strictErrorChecking = false;
@$doc->loadHTML( $html );

$div = $doc->getElementById( 'edition_table' );



    $str = $div->nodeValue;


echo $str;

?>

This code takes all data from the site’s classification table: https://www.zerozero.pt/edicao.php?id_edicao=135716

And return the following:

inserir a descrição da imagem aqui

It returns me all the values, but I intended to separate each value obtained by a space. That is to separate each value he pulls to get deferential:

Example: P J V E D GM GS DG 1 National 50 24 etc...

I wanted to make my code return like this.

How can I do?

  • 1

    IF you’re already using DOM why use EXPLODE? Use DOM to get the TR and then to get the TD.

  • 1

    Another thing, do not use error_reporting with 0, it is not to hide errors, by the way nor errors should appear, if it appears is because there is something wrong, I recommend you read: Why use error_reporting with display_errors and display_startup_errors?

  • As it would be in the case with the DOM?

  • Use https://www.php.net/manual/domdocument.getelementsbytagname.php or learn Xpath, ps: Without wanting to jab, on my FW I created a CSS Selector Xpath Converter to use with PHP DOM: https://github.com/inphinit/inphinit/wiki/QuerySelector-%28selectors-CSS%29-com-PHP#reading-a-p%C3%A1gina-external, of course you’ve already started a project in pure php or another framework, but it’s just an indication.

  • I will edit the question, I did it only with the DOM, but I still can not do what I want...

  • Test your code over and over again and you will see that the return is empty. It seems to me that the website you are trying to pull the data from does reCaptcha checking. This means that without reCaptcha validation, the table is not loaded by getElementById( 'edition_table' ).

  • Or he uses cookie. No cookie he tries to do the reCaptcha check to avoid that which you are trying to do.

  • I tested the code about 50x in a row and I didn’t get that problem, I don’t know if it has to do with the location...

Show 3 more comments

2 answers

3


You can solve this problem using the Domdocument in this way, use getElementsByTagName to fetch the tag table,th and td:

<?php
$url = 'https://www.zerozero.pt/edicao.php?id_edicao=135716';


$Array = [];
$html = file_get_contents($url);
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
@$doc->loadHTML( $html );

$div = $doc->getElementById( 'edition_table' );

$table = $div->getElementsByTagName('table');

$ths = $table[0]->getElementsByTagName('th');
foreach($ths as $th):
  $Array[] = $th->nodeValue;
endforeach;

$tds = $table[0]->getElementsByTagName('td');
foreach($tds as $td):
  $Array[] = $td->nodeValue;
endforeach;

print_r($Array);

Follows the repl of it working: https://repl.it/@Kleberoliveira/How to separate results-obtained-Loadhtml

  • Perfect. That’s just what I needed! Thank you very much.

-3

Good morning Gonçalo!

A Friend here at the Site asked a similar question, maybe it solves your problem...

The following solution has been proposed:

function tdrows($elements)
{
    $str = "";
    foreach ($elements as $element) {
        $str .= $element->nodeValue . ", ";
    }

    return $str;
}

function getdata()
{
    $contents = "<table><tr><td>Row 1 Column 1</td><td>Row 1 Column 2</td></tr><tr><td>Row 2 Column 1</td><td>Row 2 Column 2</td></tr></table>";
    $DOM = new DOMDocument;
    $DOM->loadHTML($contents);

    $items = $DOM->getElementsByTagName('tr');

    foreach ($items as $node) {
        echo tdrows($node->childNodes) . "<br />";
    }
}

getdata();

Reference: HTML Table to PHP Array

Hug.

  • While this link may answer the question, it is best to include the essential parts of the answer here and provide the link for reference. Replies per link only can be invalidated if the page with the link is changed. - Of Revision

  • Thank you for the informative veroneseComS. I will edit the reply.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.