You can use regex on preg_match_all
:
|<div class="listdoTexto">[\w\W]*<\/ul>|
Where [\w\W]*
will pick any character between <div class="listdoTexto">
and <\/ul>
:
\w -> qualquer caractere alfanumérico e underscore "_")
\W -> qualquer caractere NÃO alfanumérico e underscore "_")
* -> uma ou quantas ocorrências houverem entre o último </ul> até <div class="listdoTexto">
This will return me an Array in the index [0]
, where it contains the output you want. In this case I convert the Array to string with implode
:
$GetTexto = implode(",", $GetTexto[0]);
However we may have a problem if there is more list in HTML. Example:
|→ <div class="listdoTexto">
| <ul>
Só quero | <li class="texto"><a href="textolink">texto1</a></li>
pegar | <li class="texto"><a href="textolink">texto2</a></li>
esta | <li class="texto"><a href="textolink">texto3</a></li>
parte... | <li class="texto"><a href="textolink">texto4</a></li>
|→ </ul>
</div>
<div class="listdoTexto2">
<ul>
..mas o <li class="texto"><a href="textolink">texto5</a></li>
regex irá <li class="texto"><a href="textolink">texto6</a></li>
até aqui → </ul>
</div>
That is, the result of $GetTexto
after the implode
would be this:
<div class="listdoTexto">
<ul>
<li class="texto"><a href="textolink">texto1</a></li>
<li class="texto"><a href="textolink">texto2</a></li>
<li class="texto"><a href="textolink">texto3</a></li>
<li class="texto"><a href="textolink">texto4</a></li>
</ul>
</div>
<div class="listdoTexto2">
<ul>
<li class="texto"><a href="textolink">texto5</a></li>
<li class="texto"><a href="textolink">texto6</a></li>
</ul>
How I want to catch only until the first </ul>
, can I use substr
with strpos
:
$GetTexto = substr($GetTexto, 0, strpos($GetTexto, "</ul>"));
The result now is this:
<div class="listdoTexto">
<ul>
<li class="texto"><a href="textolink">texto1</a></li>
<li class="texto"><a href="textolink">texto2</a></li>
<li class="texto"><a href="textolink">texto3</a></li>
<li class="texto"><a href="textolink">texto4</a></li>
</ul>
Since I only want the text, I use strip_tags
to delete the tags:
$GetTexto = strip_tags($GetTexto);
It will return only the text, but with line breaks and possible spaces before, after or between the texts:
texto1
texto2
texto3
texto4
Can I use preg_replace
with trim
to replace line breaks and unwanted spaces with ,,
, which will later be used in a replace
:
$GetTexto = preg_replace("/\s{2,}|\n/", ",,", trim($GetTexto));
Now we have:
texto1,,texto2,,texto3,,texto4
Now to separate the texts with comma and space, you can use str_replace
replacing the ,,
:
$GetTexto = str_replace(",,", ", ", $GetTexto);
Final result:
texto1, texto2, texto3, texto4
Although I get to the end result, I don’t know if that would be the best
approach. There may be a method using Document Object
Model more efficient, but I hope it helps.
Code:
$url = "http://www.site.com";
$html = file_get_contents($url);
$getTextoList = '|<div class="listdoTexto">[\w\W]*<\/ul>|';
preg_match_all($getTextoList, $html, $GetTexto);
$GetTexto = implode(",", $GetTexto[0]);
$GetTexto = substr($GetTexto, 0, strpos($GetTexto, "</ul>")); //
$GetTexto = strip_tags($GetTexto);
$GetTexto = preg_replace("/\s{2,}|\n/", ",,", trim($GetTexto));
$GetTexto = str_replace(",,", ", ", $GetTexto);
Testing at Ideone
I think this might help you: http://php.net/manual/en/book.dom.php
– thiagoalessio
@thiagoalessio Have some example?
– user81560