How to get the <a> tag inside a <DIV, using XPATH?

Asked

Viewed 1,632 times

2

I’m trying to get the data from a DIV that contains the following structure:

<div class="item" style="height:273px">
<a href="/arapiraca/anuncios/detalhes/159695-honda-cg-150-2008">
    <img alt="" src="/i.php?w=148&h=119&f=3,0|4,0&src=uploads/anunciosfotos/2014/04/858a6126588bdace8bc0f144f900d097.jpg"></img>
    <img src="/img/icone-novo.png" alt="" style="position: absolute; z-index: 20; width: 60px; height: 60px; right: -5px; top: -10px; border: 0"></img>
    <strong class="nome" style="font-weight:normal">
        HONDA CG 150 2008 TITAN - KS GASOLINA
    </strong>
    <strong class="valor">
        R$ 4.500,00
    </strong>
    <span class="vendedor">
        <span>
            <img alt="" src="/i.php?w=148&h=60&src=uploads/clientes/2659aa7030bac6f245852b948187188a.jpg"></img>
        </span>
    </span>
</a>
<input class="comparacao" type="checkbox" name="comparacao[159695]" value="159695"></input>

$dom = new DOMDocument();
@$dom->loadHTML($content);

$xpath = new DOMXPath($dom);
$classname = "item";
$nodes = $xpath->query("//*[@class='" . $classname . "']");

foreach ($nodes as $node) {
     echo $node->nodeValue . " <br> ";
}

With the above code, I can get only the following result:

HONDA CG 150 2008 TITAN - KS GASOLINA R$ 4.500,00 

I need to also get the tags to.

  • you want all the <img><span><strong> inside <a> ?

  • I want to get what’s here <a href="/Arapiraca/anuncios/detalhes/159695-Honda-cg-150-2008">

  • $node->attributes?

  • @Beet;Catchable fatal error: Object of class DOMNamedNodeMap could not be converted to string

2 answers

1


The Xpath expression you are using returns all the elements that have attribute classworthwhile item:

//*[@class='item']

It is a collection. Your code navigates through the items in this collection, one of which is the divthat you’re showing.

If you print the value of this expression as a string (nodeValue), it only returns the contents text tags it contains. But you can use more elaborate absolute Xpath expressions to get exactly what you want.

To obtain the element a that’s inside that div you just need to add one more step:

//*[@class='item']/a

In the above case, the Xpath is returning a element. If you want the content of attribute href of the element a, then add one more step containing @href or (attribute::href):

//*[@class='item']/a/@href

I was wondering if you wanted to extract the text inside the <a>. If applicable (extract content in text format from <strong class='nome'>), can do this directly on Xpath using:

//*[@class='item']//*[@class='nome']/text()

The function text() returns the result of the expression not as an XML node, but directly as a string. This will affect how you use the data (you can read the string, but you won’t be able to read the attributes of the element that contains it, for example - you can’t use more attribute or nodeValue).

1

You can add a new query only with //a/@href, or change the query to return two sets of nodes, using the operator '|'

//*[@class='item'] | //a/@href

Then you have to adjust the cycle foreach, eventually.

Good job!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.