0
I need to extract all the text from an html. So I decided to look at Beaultisoup, to see how I did it with it. But he started to show the text right at the beginning, here’s the code:
import requests
from bs4 import BeautifulSoup
url = 'http://servicos2.sjc.sp.gov.br/servicos/horario-e-itinerario.aspx?acao=p&opcao=1&txt='
r = requests.get(url)
print(r.text)
soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('table', class_='textosm')
print(lista)
The mistake he makes is
Traceback (most recent call last):
File "C:/Users/Ariane/PycharmProjects/extracao/teste.py", line 9, in <module>
soup = BeautifulSoup(r.text, 'lxml')
File "C:\Users\Ariane\PycharmProjects\extracao\venv\lib\site-packages\bs4\__init__.py",
line 196, in __init__ % ",".join(features))
bs4.FeatureNotFound:
Couldn't find a tree builder with the features you requested: lxml.
Do you need to install a parser library?
I did the installation of lxml
and traded it to html.parse
, but the error remains the same.
Someone can help out?
lxml was installed yes.. It gave the error, I installed it and it continued giving the error as if the installation had not occurred.
– user124673
@Arianemateu python is x64 or x86?
– Guilherme Nascimento
x64, but I gave up and used the linux terminal to do what I needed. It worked, thank you
– user124673
@Arianemateus strange, because your system looks like windows, unless you are using Cygiwn or some equivalent software.
– Guilherme Nascimento