Definitely the explode
is not one of the best ways to capture elements HTML of a code. Another point is that, on the site, there are no elements with the class Numbers.
Instead of explode
, you can use regex (although not the best option); Domdocument and Domxpath; You can even use libraries for this purpose, such as querypath.
Let’s start with native functions of PHP.
Capturing HTML from the site.
To capture website information, we may use file_get_contents to capture the HTML of the site.
With this function we need to pass only to URL, but you can also pass the argument flag
:
- FILE_TEXT: Read the HTML in the pattern UTF-8
- FILE_BINARY: Read the HTML in the binary.
And also the argument context
. In it we can pass an object of the type Stream Context. So we can pass on some values like Cookies
, User-Agent
etc..
$content = file_get_contents("https://www.infomoney.com.br/mercados/cambio");
Loading the received data
To process this data, we need to use the class Domdocument. With it we can carry the HTML, capture through a tag or ID; We can create and read attributes etc.
To do this, simply instantiate the object and use the method loadHTML
.
$dom = new domDocument();
@$dom->loadHTML($content);
The @ serves to ignore HTML errors, for example, not closing a tag.
With this it is already possible to list the elements that contain the name, price and sale data of the currencies. If it is a structure HTML basic, would be enough:
- Use the method
getElementsByTagName
- Traverse all the necessary elements through a
foreach
- Filter them and display on screen.
But we want something simpler (since the structure of the site is not so simple for our code), so we will use the class Domxpath.
Filtering the necessary elements
To filter the data we will use the class Domxpath, with it we can create darlings to make searching for the elements simpler.
This class, unfortunately, does not work with the jQuery. She works with expressions Xpath. Through these expressions, we will be able to search for the elements. It works as follows:
- nodename: Selects all elements with the name
nodename
- /: Select the root element
- //: Selects the element in the document from the current node that corresponds to the selection, regardless of where they are
- . (dot): Selects the current element
- .. (two point): Selects the "parent element"
- @: Selects from the attributes
To learn more, follow the link from documentation
Now we can go ahead and start selecting the elements, but first let’s instantiate the object Domxpath and then use the method query
.
$tables = $xpath->query("//table[@class=\"table-general\"]");
$values = $xpath->query(".//tbody/tr", $tables->item(0));
This way we will access all the elements table
who owns the class table-general
, soon after we will access all the elements tbody tr
of the first table found.
Now we only need to execute one foreach
in the variable $values
to obtain the elements td
(where are the values).
Filtering the necessary values
In the foreach
above, we will once again use the query
. This will return us an object of the type Domnodelist.
This returned object allows us to access the method item
, which in turn returns a Domnode. With that Domnode, we can return the content of the element or an attribute of it.
$currencies = [];
foreach($values as $value) {
$currency = $xpath->query(".//td", $value);
/* Acessa o conteúdo em texto do primeiro elemento TD */
$name = trim($currency->item(0)->textContent);
/**
* Acessa o conteúdo o segundo elemento da tag TD,
* após isso capturamos o "irmão" (próximo elemento na mesma raiz), no caso o IMG
* e então captura o atributo SRC
*/
$img = trim( $currency->item(1)->firstChild->nextSibling->getAttribute("src") );
/* Acessa o conteúdo em texto do terceiro elemento TD */
$purchasePrice = trim($currency->item(2)->textContent);
/* Acessa o conteúdo em texto do quarto elemento TD */
$salePrice = trim($currency->item(3)->textContent);
/* Armazenamos em um array para posteriormente exibir aos usuários. */
$currencies[] = [
"img" => $img,
"name" => $name,
"purchasePrice" => $purchasePrice,
"salePrice" => $salePrice,
];
}
Displaying the data to the user
This step is optional, we will only demonstrate (for those in doubt), how to display these values on the screen.
I will use the Pure-CSS for this step, however it is also optional.
<link rel="stylesheet" href="https://unpkg.com/[email protected]/build/pure-min.css" integrity="sha384-nn4HPE8lTHyVtfCBi5yW9d20FjT8BJwUXyWZT9InLYax14RDjBj46LmSztkmNP9w" crossorigin="anonymous">
<table class="pure-table">
<thead>
<tr>
<th></th>
<th>Moeda</th>
<th>Compra</th>
<th>Venda</th>
</tr>
</thead>
<tbody>
<!-- Percorre todo o array criado -->
<?php foreach($currencies as $currency): ?>
<tr>
<!-- Concate o endereço do site com o caminho da imagem -->
<td><img src="<?php echo "http://www.infomoney.com.br{$currency['img']}" ?>" /></td>
<!-- Exibe nome da moeda -->
<td><?php echo $currency['name'] ?></td>
<!-- Exibe valor da compra -->
<td><?php echo $currency['purchasePrice'] ?></td>
<!-- Exibe valor da venda -->
<td><?php echo $currency['salePrice'] ?></td>
</tr>
<?php endforeach; ?>
</tbody>
</table>
Complete Code
Using the library QueryPath
I will be brief here, I will just post the commented code. For more details explanations just access to documentation.
<?php
/* Carrega as bibliotecas via composer */
require_once "vendor/autoload.php";
/* Acessa e baixa o HTML do site */
$qp = html5qp("https://www.infomoney.com.br/mercados/cambio");
/* Captura os atributos "tr" da tag "tbody" da primeira tabela coma classe "table-general" */
$values = $qp->find("table.table-general:first tbody tr");
/* Percorre os valores capturados */
foreach($values as $value) {
/* Armazenamos em um array para posteriormente exibir aos usuários. */
$currencies[] = [
"img" => trim($value->find('td:eq(2) img')->attr("src")),
"name" => trim($value->find('td:eq(1)')->text()),
"purchasePrice" => trim($value->find('td:eq(3)')->text()),
"salePrice" => trim($value->find('td:eq(0)')->text()),
];
}
Complete Code
What to say is that these indices do not exist can use
isset()
to verify.– rray
Which are lines 15, 17, 24, 29, 34, 39 and 44?
– Victor Stafusa
I put the lines in the post.
– richardmanzoli