1
I’m trying to get a text after a tag that’s inside a div, in an html. The problem I’m having is that I’m not getting the text, just an empty string. I’ve looked elsewhere and I haven’t seen anyone with a similar problem :/
Here comes the html code:
<div class="list-view-item-title-wrapper">
<div class="list-view-item-title-top">
<div class="list-view-item-type">
"Webcast"
</div>
</div>
<a href="/resources/actionable-awareness-unlock-your-influence" class="list-view-item-title">
<h2>
"Actionable Awareness: Unlock Your Influence"
</h2>
</a>
<div class="list-view-item-date">
<i class="fa fa-calendar"></i>
"September 24, 2020"
</div>
...
</div>
And the python:
def get_posts_elements(self, html):
posts = self.get_posts(html)
# - get_posts -> retorna html.xpath("//div[@class='list-view-item-title-wrapper']")
# - html -> lxml.html.fromstring(requests.get('https://www.scrum.org/resources'))
for post in posts:
# --- Recebendo com sucesso:
try:
self.data['Type'].append(post.xpath(".//div[@class='list-view-item-type']")[0].text.strip())
except:
self.data['Type'].append('')
try:
self.data['Title'].append(post.xpath(".//a[@class='list-view-item-title']/h2")[0].text.strip())
except:
self.data['Type'].append('')
try:
self.data['Link'].append(urljoin(self.base_url, post.xpath(".//a[@class='list-view-item-title']/@href")[0]))
except:
self.data['Link'].append('')
# --- Recebendo com falha:
data = post.xpath(".//div[@class='list-view-item-date']")[0].text
print(data)
In this case, I want to pick up the texts referring to the dates of each post, as I do with the title and type. In the above example it would be "September 24, 2020" but I only get an empty string.
My time:
import lxml.html as parser
import requests
from urllib.parse import urlsplit, urljoin
It has to be with xpath ?? Have you tried using Beautifulsoup with Selenium and always have good results. - Beautiful Soup - Selenium
– Juan Caio
I used Beautifulsoup and Selenium, I also had good results, but this time I need to use only Xpath :/
– Wiliane Souza