0
I’m having a problem while making a Scrap of a page and capturing text.
Basically the beginning of my code is as follows:
url0 = 'https://www.service.bund.de/Content/DE/Ausschreibungen/Suche/Formular.html?nn=4641482&cl2Addresses_Adresse_State=nordrhein-westfalen&resultsPerPage=100'
r = requests.get(url0,headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(r.text, 'html.parser')
content = soup.find('ul', {"class": "result-list"})
links = content.find_all('a')
Each row of the table of the site I am trying to search for is an element of the "links" list. Well, then I want to take the first column (Ausschreibung) that is inside the H3 tag inside each element of the links list. Only this tag has a second tag embedded:
# Usando um exemplo de elemento links:
y = links[0]
b = y.find('h3')
b
# output: '<h3><em>Ausschreibung</em>Erneuerung SDRL 3</h3>'
The problem is that when I go get the text of these tags my machine (Windows 10) is "reading" also the tag and translating everything wrong:
c = y.find('h3').text
c
# Output: 'AusschreibungEr\xadneue\xadrung SDRL 3'
Using get_text() gives the same result.
What interests me inside object b is "Er-Neue-Rung SDRL 3". How can I pass everything to text ('Ausschreibung Er-Neue-Rung SDRL 3" or delete the tag 'em' inside b to stay with the text "Er-Neue-Rung SDRL 3" ?