How to extract specific data from a php html file?

Asked

Viewed 3,277 times

2

I wonder how I can extract part of the contents of a file HTML. This file has dozens of emails and names and would like to extract this data. Can anyone help me to do this?

<div class="tcell tquick">
  <div style="background-color: #ddd; padding: 4px;"> 
      <span> <b class="the_nome">Marcos Vinícius Nascimento Pereira;</b> </span> 
  </div>
  <br>
  <div> </div>
  <div>
    <div class="c the_email">[email protected]</div>
  </div>
  <div> </div>
</div>

In this case I would like to extract the nome and the email with the PHP.

  • Everything in email is inside a div with the class c the_email?

  • Yes, I will generate these classes within the email and of name as shown in the example !!! Thank you.

  • If you will create the HTML and then will extract data, it would no longer be simple to create in the format you want?

2 answers

6


It’s just for this type of task that PHP has support for Xpath. Suppose you have your HTML output that way and that it’s under a URL like localhost/emails.html:

<!DOCTYPE html>
<html>
<head></head>
<body>
    <div class="tcell tquick">
        <div style="background-color: #ddd; padding: 4px;">
            <span> <b class="the_nome">Ciclano;</b> </span>
        </div>
        <br>
        <div> </div>
        <div>
            <div class="c the_email">[email protected]</div>
        </div>
        <div> </div>
    </div>
    <div class="tcell tquick">
        <div style="background-color: #ddd; padding: 4px;">
            <span> <b class="the_nome">Fulano;</b> </span>
        </div>
        <br>
        <div> </div>
        <div>
            <div class="c the_email">[email protected]</div>
        </div>
        <div> </div>
    </div>
</body>
</html>

So you could string this content with Domdocument and use another class called Domxpath:

<?php 

$html_content = file_get_contents('http://localhost/emails.html');

$dom = @DOMDocument::loadHTML($html_content);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[@class="tcell tquick"]');    

foreach ($nodes as $node) {
    $nome  = $xpath->query('div/span/b[@class="the_nome"]', $node)->item(0);
    $email = $xpath->query('div/div[@class="c the_email"]', $node)->item(0);

    echo $nome->nodeValue  . PHP_EOL;
    echo $email->nodeValue . PHP_EOL;
}

This will do exactly what you need.

  • Through the reply I made adaptation in my code that returned what I needed. Perfect and thank you !!!

1

You can do it using regular expressions. I did a similar example, I needed to get the price of soy on the rural channel website.

inserir a descrição da imagem aqui

Browser other questions tagged

You are not signed in. Login or sign up in order to post.