-1
I’ve come to report a mistake that’s causing me to bump my head. I am trying to make a code to download books (pdf) from a Laotian site because it is almost impossible to download all these books manually, so I tried to do the wget https://lao-online.com/books/download/1.html
only in python and changing the links that follow a pattern, and the code went like this
import wget
count = 1
print(f'Vamo atrás do {count}° link')
while count < 1800:
url = (f'https://lao-online.com/books/download/{count}.html')
print(url)
sleep(1)
wget.download(url)
sleep(1)
filename = wget.download(url)
print('Sucesso!')
count+=1
quit()
but for some reason it seems that python’s wget library won’t let me download pdf vide since I was able to download other media content. When I try to run the python code I can’t download anything and it returns me this error:
Traceback (most recent call last):
File "/home/mathie/Laos/raspagem.py", line 14, in <module>
wget.download(url)
File "/home/mathie/.local/lib/python3.9/site-packages/wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "/usr/lib/python3.9/urllib/request.py", line 239, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/usr/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
So my dear, well, let’s say that Laos does not have much security on its websites to limit its downloads and thanks to an exploit, when I access https://lao-online.com/books/download/32.html the site returns me the requested pdf, and as I mentioned when I use pure wget in the terminal wget me download the pdf, but in python it does not happen, well, I will try something the request, thanks for the walk of the XD stones
– Mathie Mathieus