Pick up content from a div, without picking up child elements with xPath

Asked

Viewed 430 times

0

Good afternoon!

I need to get information that is contained in Divs in an HTML.

HTML:

<div id="fundo_conteudo_noticia_setor" class="textogeral marrom">
            <div id="data_noticia_setor" class="textogeral_bold verde">Data</div>
            <div id="conteudo_noticia_setor">
                <a href="noticia_interna.asp?id=13692" class="sublinhado verde">
                    <span class="titulo_destaque_bold verde">Título<br>
                        <span class="titulo_destaque verde">Categoria</span>
                    </span>
                    <br><br>
                </a>
                Resumo do conteúdo...
            </div>
            </div>
            <div id="seta_noticia_setor"><i class="fa fa-angle-right fa-3x verde"></i></div>
        </div>

And this is the PHP I’m using to get the information. However, when I get the Node $result['title'], it’s returning together the child elements.

if(!$data = file_get_contents("meusiteteste.com.br")){
    $results = false;
}
else {

    $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $data);
    $doc = new DomDocument();
    @$doc->loadHTML($html);
    $xpath = new DomXpath($doc);
    $entries = $xpath->query("//div[@id=\"conteudo_noticia_setor\"]");
    $results = array();

    foreach ($entries as $entry){

        $node = $xpath->query("a/attribute::href", $entry);
        $result['link'] = $node->item(0)->value;

        echo $result['link'].'<br>';

        $node = $xpath->query("a/span[contains(@class, 'titulo_destaque_bold')]", $entry);
        $result['titulo'] = $node->item(0)->nodeValue;

        echo $result['titulo'].'<br><br>';

    }
}

You are printing: Titulocategoria I would like to take the contents of the span inside, but I do not know how to proceed.

1 answer

1

$html = '<div id="fundo_conteudo_noticia_setor" class="textogeral marrom">
            <div id="data_noticia_setor" class="textogeral_bold verde">Data1</div>
            <div id="conteudo_noticia_setor">
                <a href="noticia_interna.asp?id=13692" class="sublinhado verde">
                    <span class="titulo_destaque_bold verde">Título1<br>
                        <span class="titulo_destaque verde">Categoria1</span>
                    </span>
                    <br><br>
                </a>
                Resumo do conteúdo1...
            </div>
            </div>
            <div id="seta_noticia_setor"><i class="fa fa-angle-right fa-3x verde"></i></div>
        </div>
        <div id="fundo_conteudo_noticia_setor" class="textogeral marrom">
            <div id="data_noticia_setor" class="textogeral_bold verde">Data2</div>
            <div id="conteudo_noticia_setor">
                <a href="noticia_interna.asp?id=13692" class="sublinhado verde">
                    <span class="titulo_destaque_bold verde">Título2<br>
                        <span class="titulo_destaque verde">Categoria2</span>
                    </span>
                    <br><br>
                </a>
                Resumo do conteúdo2...
            </div>
            </div>
            <div id="seta_noticia_setor"><i class="fa fa-angle-right fa-3x verde"></i></div>
        </div>

        ';


preg_match_all('/<div id="fundo_conteudo_noticia_setor"[^>]+>(.*?)<div id="seta_noticia_setor">/ism', $html, $div_conteudo);
foreach($div_conteudo[1] as $div) {

    $data = preg_match('/<div id="data_noticia_setor"[^>]+>([^<]+)/i',$div, $data) ? $data[1] : NULL;
    $titulo = preg_match('/<span class="titulo_destaque_bold[^"]+">([^<]+)/i',$div, $titulo) ? $titulo[1] : NULL;
    $categoria = preg_match('/<span class="titulo_destaque [^"]+">([^<]+)/i',$div, $categoria) ? $categoria[1] : NULL;

    echo $data.PHP_EOL;
    echo $titulo.PHP_EOL;
    echo $categoria.PHP_EOL.PHP_EOL;

}
  • Marcos, it’s a good way out. I tried to use your example, but in case you have more news with this HTML, it only pulls the first :/ I would need something to look for the occurrences, and then use the preg_match inside.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.