0
As we know, when accessing a website by browser, various files such as images, music, scripts, CSS, among others, are downloaded to be used within a page HTML
.
Using the library requests
, it is possible to use the function get
to request the page, but it does not receive other files or at least the links to download those other files. So if I want to save a page on my computer, it would be incomplete, example:
Html page:
<!DOCTYPE html>
<html>
<head>
<title>Teste de JavaScript</title>
<meta charset="utf-8">
</head>
<body>
<img src="caozinho.png" alt="Cachorrinho">
</body>
</html>
Python code to save page:
import requests
url = "https://meuSite.com/"
response = requests.get(url)
with open("página.html","wb") as file:
file.write(response.content)
Final result:
Is there any way to get all files from a request using the requests
?
The concept of listing directory is not present in HTTP protocol. The existence of this type of listing will depend on whether or not the server is configured to display an HTML with the folder structure and semantics in which this HTML is built.
– Augusto Vasques
Only Requests does not, but you can use it in the process. The solution to your problem is to take the contents of
response.content
, which is the page’s HTML, parsing the code by extracting all the elements you want and capturing their respective Urls. For example, you can search all the elements<img>
and extract the value of the attributesrc
to list all the image Urls on the page. From the URLS to make a new request to download them one by one, exactly as the browser does.– Woss
But remember that the URL is by definition a opaque value. This means that your copy of the site may not be true to the original structure, just reproduce the same result (visual).
– Woss
Thanks @Woss <3
– JeanExtreme002