Picking up div’s contents inside an HTML

Asked

Viewed 134 times

2

How do I get all the values inside <div class='conteudo'></div>?

I’ve tried it like this:

$links = "<ul><li>CONTEUDO
<div class='conteudo'>CORPO 1</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 2</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 3</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 4</div>
</li></ul>";

$conteudo2 = explode('</li>', $links);

foreach($conteudo2 as $key) {    
    echo ''.$key.'';    
}

But it takes all the content of the tags, not just the value from inside the div.

1 answer

2


The problem of using explode is that it breaks the string without taking into account the semantics of HTML (i.e., the meaning of each tag, the separation between what is a tag and what is the content of it, etc).

To manipulate an HTML content the way you need it, you can use DOMDocument:

$links = "<ul><li>CONTEUDO
<div class='conteudo'>CORPO 1</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 2</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 3</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 4</div>
</li></ul>";

$dom = new DOMDocument();
$dom->loadHtml($links);
$xpath = new DOMXPath($dom);
// procura elementos div com classe "conteudo"
foreach ($xpath->query('//div[@class="conteudo"]') as $div) {
    echo $div->textContent. "<br>";
}

So I look for all the elements div that have the class "content" (using the syntax of XPATH), and print their respective values. The output of the above code is:

CORPO 1
CORPO 2
CORPO 3
CORPO 4

The above code works if inside the div only has a simple text. But if inside the div have other tags and you want all this content, you need to use an auxiliary function to get the HTML of the internal content (the function below has been taken from here):

$links = "<ul><li>CONTEUDO
<div class='conteudo'>CORPO 1</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'>CORPO 2</div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'><p>CORPO 3 <span>teste com <strong>outras tags</strong></span> dentro do div</p></div>
</li></ul>
<ul><li>CONTEUDO
<div class='conteudo'><span>CORPO 4</span></div>
</li></ul>";

function innerHTML(DOMNode $element) { 
    $innerHTML = ""; 
    $children  = $element->childNodes;
    foreach ($children as $child) { 
        $innerHTML .= $element->ownerDocument->saveHTML($child);
    }
    return $innerHTML; 
}

$dom = new DOMDocument();
$dom->loadHtml($links);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//div[@class="conteudo"]') as $div) {
    echo innerHTML($div). "<br>";
}

The exit is:

CORPO 1
CORPO 2
<p>CORPO 3 <span>teste com <strong>outras tags</strong></span> dentro do div</p>
<span>CORPO 4</span>
  • but if you have <div class='content'><span>BODY 2</span></div> you wouldn’t be able to catch <span>CORPO2</span>? pq tested only handle BODY 2

  • @Rogériosilva I updated the answer

  • Ta everything ok, but know pq when something in BODY has the &character, an error appears Domdocument::loadHTML(): htmlParseEntityRef: no name in Entity?

  • @Rogériosilva Because the & has special meaning in HTML, is used for HTML entities (if you have a & "loose" in the text, is wrong). But there is already escaping the scope of the question (which is how to get the content of a given tag). Anyway, if the text has to have a &, the correct is to write it as &amp;

Browser other questions tagged

You are not signed in. Login or sign up in order to post.