3
I’m performing a parse on a file html
with the following structure :
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
Editing
The block to follow :
<div class="lstImv blackBd12"></div>
encompasses the other tags where the textContents
target, it repeats itself a few times (in the example, after editing, I put only 2).
So through this code :
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
echo "<pre>";
print_r($span);
echo "</pre>";
}
?>
I get 2 objects with the respective values :
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
)
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
)
So the way I’m doing :
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
echo "Key 7 : ".$span->textContent."<br/>";
}
?>
I get the data this way :
Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 :
Key 2 :
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9
Meaning she’s iterating on the keys
, only that I would like the Keys to come sequentially(K1,K2,...,K7, K1,K2,...,K7) and not in the way it is (K1, K1,K2,K2...,K7,K7).
You have two blocks with class
lstImv blackBd12
and want to iterate on these blocks? Since you should only iterate onnodes
children only those with classesemp
and etc...?– Marcelo de Andrade
That Marcelo, those who have the text...
– MagicHat
And it needs to be ordered something like:
[node-1][item-1] > Get_text 1
and[node-2][item-1] > Other Get_text 1
. That’s right?– Marcelo de Andrade