urllib.error.Httperror: HTTP Error 404: Not Found

Asked

Viewed 180 times

0

I have the following code, simple.

from urllib.request import urlopen
from bs4 import BeautifulSoup

word_site = urlopen('https://svnweb.freebsd.org/csrg/share/dict/words?view=markup&pathrev=61569')
bs = BeautifulSoup(word_site, parser='html.parser')

print(bs.h1)

I have searched everywhere and found no satisfactory answer. This code generates the following error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/joaof/OneDrive/Programacao/Cursos/Cursos/curso-de-programacao-em-python-do-basico-ao-avancado/exercicios/secao08/ex60.py", line 5, in <module>
    word_site = urlopen('https://svnweb.freebsd.org/csrg/share/dict/words?view=markup&pathrev=61569')
  File "C:\Users\joaof\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\joaof\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 523, in open
    response = meth(req, response)
  File "C:\Users\joaof\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 632, in http_response
    response = self.parent.error(
  File "C:\Users\joaof\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 561, in error
    return self._call_chain(*args)
  File "C:\Users\joaof\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\joaof\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Please, can someone help me? PS: the URL exists and is working normally.

1 answer

2


Usually do not need but this page you are working needs an agent, I used the identification being browser firefox Mozilla.

import requests, bs4
url = "https://svnweb.freebsd.org/csrg/share/dict/words?view=markup&pathrev=61569"
headers = {"User-Agent":"Mozilla/5.0"}
response = requests.get(url, headers=headers)
soup = bs4.BeautifulSoup(response.text, 'html.parser')

'''
# Imprimir tudo
print(soup.text)
'''

# Imprimir linha por linha, removendo linhas sem dado
for x in soup.text.splitlines():
    if len(x) > 0:
        print(x)

inserir a descrição da imagem aqui

  • Thank you very much Gilmar, based on your explanation and solution, I have remade my code.

  • opa, legal, you’re welcome.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.