Domxpath query with multiple classes

Asked

Viewed 726 times

3

I’m performing a parse on a file html with the following structure :

<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>

Editing

The block to follow :

<div class="lstImv blackBd12"></div>

encompasses the other tags where the textContents target, it repeats itself a few times (in the example, after editing, I put only 2).

So through this code :

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
    echo "<pre>";
        print_r($span);
    echo "</pre>";
}
?>

I get 2 objects with the respective values :

DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => 
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



)
DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



)

So the way I’m doing :

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
    echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
    echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
    echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
    echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
    echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
    echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
    echo "Key 7 : ".$span->textContent."<br/>";
}
?>

I get the data this way :

Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 : 
Key 2 : 
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9

Meaning she’s iterating on the keys, only that I would like the Keys to come sequentially(K1,K2,...,K7, K1,K2,...,K7) and not in the way it is (K1, K1,K2,K2...,K7,K7).

  • You have two blocks with class lstImv blackBd12 and want to iterate on these blocks? Since you should only iterate on nodes children only those with classes emp and etc...?

  • That Marcelo, those who have the text...

  • And it needs to be ordered something like: [node-1][item-1] > Get_text 1 and [node-2][item-1] > Other Get_text 1. That’s right?

2 answers

2

Follow the solution I arrived at:

<?php
$html = <<<HTML
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>
HTML;

$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);


$items = $xpath->query('//div[@class="lstImv blackBd12"]');
for($i = 0; $i < $items->length; $i++)
{
    $status = $xpath->query('//strong[@class="imvFse emp-fase"]');
    echo "Value     :".$status->item($i)->nodeValue."<br/>";    

    $titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
    echo "Value     :".$titulo->item($i)->nodeValue."<br/>";

    $titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
    echo "Value     :".$titulo2->item($i)->nodeValue."<br/>";   

    $valor = $xpath->query('//em[@class="emp-valor-apartir"]');
    echo "Value     :".$valor->item($i)->nodeValue."<br/>"; 

    $valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]');
    echo "Value     :".$valor2->item($i)->nodeValue."<br/>";

    $dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
    echo "Value     :".$dorm->item($i)->nodeValue."<br/>";

    $tam = $xpath->query('//li[@class="txtArea emp-un-area"]');
    echo "Value     :".$tam->item($i)->nodeValue."<br/>";   

}
?>

See on ideone

  • I hadn’t seen your own answer. I edited as you mentioned.

1


Yes, in the method query accepted as argument expressions, you can for example use the conditional OR for the classes you want to obtain:

$content = $xpath->query('//strong[@class="imvFse emp-fase" OR @class="emp-nome infNme colorTxt"]');

After editing the question:

<?php 

$html = <<<HTML
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>
HTML;

$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);

$content = $xpath->query('//div[@class="lstImv blackBd12"]');

$return = [];


foreach($content as $nodeKey => $nodeValue) {

    $return[$nodeKey][1] = $xpath->query('//strong[@class="imvFse emp-fase"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][2] = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][3] = $xpath->query('//span[@class="emp-loc-part1 infLoc"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][4] = $xpath->query('//span[@class="emp-loc-part2 infLoc"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][5] = $xpath->query('//li[@class="txtBed emp-un-dorms"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][6] = $xpath->query('//li[@class="txtArea emp-un-area"]', $nodeValue)->item($nodeKey)->nodeValue;
    $return[$nodeKey][7] = $xpath->query('//li[@class="txtCar emp-un-park"]', $nodeValue)->item($nodeKey)->nodeValue;
}

var_dump($return);
  • Marcelo thanks for the attention, I think my question is not yet in the way to express my real goal, I will be editing it for better understanding, I believe you will be able to show me the way...

  • Marcelo I edited, I apologize for not expressing myself as I wanted the first attempt, I appreciate if you can take a look again.

  • @Magichat checks the edited answer and if that’s what you’re trying to get.

  • Marcelo, thank you again: "only I would like the Keys to come sequentially(K1,K2,...,K7, K1,K2,...,K7) and not in the way it is (K1, K1,K2,K2...,K7,K7)". I appreciate the elegance of your code and I am sure that with small adjustments you can arrive at the expected result, it is certain that this must have occurred due to my lack of clarity, yet I’ll be posting the solution I’ve been trying to find and I’ll be waiting for the adjustments in yours to validate it as correct too... See through my solution the goal I was looking for. Vlw man ;)

  • @Magichat made the change. If the structure stays this way.

  • Top man, vlw the force...had already left my +1 ... The solution I was looking for was exactly this way that used... I need to study more this kind of interaction...

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.