1
Based on the text below, how to keep the text output from the first column of tag span
, that matches the text of the latter span
?
<span class="CVA68e qXLe6d">Colcha Casal e ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Colcha Solteiro e ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Roupão de banho ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › banho</span> </span>
<span class="CVA68e qXLe6d">Caminho de mesa ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › mesa</span> </span>
<span class="CVA68e qXLe6d">Cortina para quarto ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Travesseiro de pena com ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Fronha de Solteiro em ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Lençol 70% algodão e ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Pano de prato pintado a ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › mesa</span> </span>
<span class="CVA68e qXLe6d">Coberto dupla face colo... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › cama</span> </span>
<span class="CVA68e qXLe6d">Toalha de rosto felpudo ... - TorraTudo</span> <span class="qXLe6d dXDvrc"> <span class="fYyStc">www.torratudo.com › banho</span> </span>
Remembering that the above text has several paragraphs and, what is decisive in this matter is to achieve take the titles of the first
span
through filtration by #hashtag› cama/mesa/banho
of the third/lastspan
.
What I tried: the sed
together with the grep
in its simple form of use:
sed 's/\"/\n/g' /tmp/default.htm | grep "TorraTudo"
Significado da opção \" \n: \" - Filtrar apóstrofos, \n - Quebrar linha por linha a cada apóstrofo.
- This gives me a list, as below:
>Colcha Casal e ... - TorraTudo</span> <span class=
>Colcha Solteiro e ... - TorraTudo</span> <span class=
>Roupão de banho ... - TorraTudo</span> <span class=
>Caminho de mesa ... - TorraTudo</span> <span class=
>Cortina para quarto ... - TorraTudo</span> <span class=
>Os Simpsons em Português - YouTube</span> <span class=
>Travesseiro de pena com ... - TorraTudo</span> <span class=
>Fronha de Solteiro em ... - TorraTudo</span> <span class=
>Lençol 70% algodão e ... - TorraTudo</span> <span class=
>Pano de prato pintado a ... - TorraTudo</span> <span class=
>Coberto dupla face colo... - TorraTudo</span> <span class=
>Toalha de rosto felpudo ... - TorraTudo</span> <span class=
But note that there is no distinction between Bed/Table/Bath
I even tried something like:
sed 's/\"/\n/g' /tmp/default.htm | grep "TorraTudo\(^.*$\) ›\; cama"
sed 's/\"/\n/g' /tmp/default.htm | grep "TorraTudo\(^.*$\) ›\; mesa"
sed 's/\"/\n/g' /tmp/default.htm | grep "TorraTudo\(^.*$\) ›\; banho"
Among several useless attempts I made out these shown here, I decided to ask who has more experience in this subject (Regular expression).
This is what I need to separate each title from its category bed/table/bath.
There’s line breaking between the spans or it’s the way you put it there?
– Kiritonito
@Kiritonito There is no line break between the
span.
It’s the original way of what I have with me. It is a real example even, can save it on your PC and try to filter, because this text is what reflects my difficulty.– Diego Henrique