bs4.Featurenotfound (Beaultifullsoup and parser error)

Question

bs4.Featurenotfound (Beaultifullsoup and parser error)

Asked 6 years, 3 months ago

Viewed 1,214 times

0

I need to extract all the text from an html. So I decided to look at Beaultisoup, to see how I did it with it. But he started to show the text right at the beginning, here’s the code:

import requests
from bs4 import BeautifulSoup

url = 'http://servicos2.sjc.sp.gov.br/servicos/horario-e-itinerario.aspx?acao=p&opcao=1&txt='
r = requests.get(url)
print(r.text)

soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('table', class_='textosm')
print(lista)

The mistake he makes is

Traceback (most recent call last):
  File "C:/Users/Ariane/PycharmProjects/extracao/teste.py", line 9, in <module>
    soup = BeautifulSoup(r.text, 'lxml')
  File "C:\Users\Ariane\PycharmProjects\extracao\venv\lib\site-packages\bs4\__init__.py",
    line 196, in __init__  % ",".join(features))

bs4.FeatureNotFound: 
  Couldn't find a tree builder with the features you requested: lxml.
  Do you need to install a parser library?

I did the installation of lxml and traded it to html.parse, but the error remains the same.
Someone can help out?

3 answers

2

You requested the use of lxml, reading the error message it informs:

Couldn’t find a Tree Builder with the Features you requested: lxml.

Translating:

You cannot find a structure/tree builder with the functionality you requested: lxml

If you read the documentation you will notice what these features are:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Currently are:

Where this spelled "lxml’s XML parser" is also entered:

External C dependency

That is, an extra lib is required, in case lxml:

https://pypi.org/project/lxml/

To install use on CMD:

pip install lxml

lxml was installed yes.. It gave the error, I installed it and it continued giving the error as if the installation had not occurred.

– user124673

2019/03/31 at 12:26
@Arianemateu python is x64 or x86?

– Guilherme Nascimento

2019/03/31 at 19:03
x64, but I gave up and used the linux terminal to do what I needed. It worked, thank you

– user124673

2019/04/02 at 11:04
@Arianemateus strange, because your system looks like windows, unless you are using Cygiwn or some equivalent software.

– Guilherme Nascimento

2019/04/02 at 11:14

Browser other questions tagged python-3.x beautifulsoup

You are not signed in. Login or sign up in order to post.

by Icaro Martins • **4,187** points · Answer 1 · 2019-03-31T03:24:56+00:00

0

From what informs the error it seems that you do not have lxml installed, to install you can use one of the console commands below.

pip install lxml
# ou
python3 -m pip install lxml
# ^
# python que você usa para rodar os arquivos `.py`

Beautiful Soup 4 Documentation - Installing a Parser

1

But lxml is installed.

– user124673

2019/03/31 at 12:26
If you have more than one python version installed on your pc, you have to install it, so I put the second command, for you to install in the python version you are using. If you are wearing Conda you can install with this command conda install lxml remembering that in this case env active should also be the one you are using to run the code, ie source activate <nome_da_env>

– Icaro Martins

2019/03/31 at 14:38
I say this because the error is saying that it is not installed and as you say it is, I imagine you may have installed it in another version of python. = D

– Icaro Martins

2019/03/31 at 14:40
1

I ended up running everything on the linux terminal inside windows, then it worked hahaha.. Thank you

– user124673

2019/04/02 at 11:05

by Filipe Jorge • 1 point · Answer 2 · 2019-08-16T02:50:56+00:00

Hello! I was in the same trouble as you.

I use anaconda; and through CMD I installed lxml in the correct environment... So far so good, because I was using cmd directly to run my algorithm. However, when I tried to use the vscode terminal, I found this problem there that you reported.

What I did: I went to the vscode user settings and configured cmd as the "terminal.integrated.shell.windows". I did this because I noticed that the problem occurred when using Powershell (which is the default integrated terminal of vscode).

I don’t know why Powershell gave the problem, but after I switched to cmd my algorithm ran smoothly. Perhaps you could check in your IDE which shell was used. (In my tests I tried bash tbm, and the same problem occurred with lxml).

I hope I helped. Thank you.