download with requests and open python

Asked

Viewed 231 times

1

The following code is part of a loop that will download multiple files and save to their respective directories.

I’m trying to download the file from the "url" and want to save in a directory of choice. The problem is that the file does not have a default name, it is generated randomly. Each url has its file with a non-standard name.

My doubt would be like "catch" the file name to set in open('C:/teste/NOME_DO_ARQUIVO.zip'..., since the file name is required to be able to download and save.

url= "http://www.rad.cvm.gov.br/ENETCONSULTA/frmDownloadDocumento.aspx?CodigoInstituicao=1&NumeroSequencialDocumento=98925" 
zip = requests.get(url, verify = False) 
with open('C:/teste/NOME_DO_ARQUIVO.zip', 'wb') as teste:
     teste.write(zip.content)
  • Saul, good morning! You want to set a name arbitrarily or you want the name to be what comes by "default" in the file?

  • by default, thank you

2 answers

1

thanks again @lmonferrari The solution:

a = re.findall(r'filename=(.*)', zip.headers['content-disposition']): 
arq = "".join(a)
with open('C:/teste/' + arq, 'wb') as teste:
     teste.write(zip.content)

search in the headers the file name

a = re.findall(r'filename=(.*)', zipheaders['content-disposition']): 

Convert the list file name to string:

arq = "".join(a)

Saved in directory by adding file name

with open('C:/teste/' + arq, 'wb') as teste:
     teste.write(zip.content)

1


Importing the required packages, please note that I have placed warnings as it shows a ssl error

import requests
import zipfile
import io
import warnings

warnings.filterwarnings('ignore')

File url

url = 'http://www.rad.cvm.gov.br/ENETCONSULTA/frmDownloadDocumento.aspx?CodigoInstituicao=1&NumeroSequencialDocumento=98925'

Request for the file

response = requests.get(url, verify = False, stream = True)

Creating the zip file

file = zipfile.ZipFile(io.BytesIO(response.content))

Extracting the file(note that the path is the path where you will unzip the file, in this case will unzip in the script directory in zips folder)

path = './zips'
file.extractall(path)

Code

import requests
import zipfile
import io
import warnings

warnings.filterwarnings('ignore')

url = 'http://www.rad.cvm.gov.br/ENETCONSULTA/frmDownloadDocumento.aspx?CodigoInstituicao=1&NumeroSequencialDocumento=98925'

response = requests.get(url, verify = False, stream = True)

file = zipfile.ZipFile(io.BytesIO(response.content))
path = './zips'
file.extractall(path)

Update

To save without extracting

import requests
import zipfile
import io
import warnings

warnings.filterwarnings('ignore')

url = 'http://www.rad.cvm.gov.br/ENETCONSULTA/frmDownloadDocumento.aspx?CodigoInstituicao=1&NumeroSequencialDocumento=98925'

response = requests.get(url, verify = False, stream = True)

file = zipfile.ZipFile(io.BytesIO(response.content))

name = ''.join(a for a in file.namelist() if a.endswith('itr'))[1:-4]
with open(f'{name}.zip', 'wb') as f:
    for a in response.iter_content(chunk_size=128):
        f.write(a)
  • 1

    thanks lmonferrari, I will adapt in code, hug

  • For nothing Saul, big hug!

  • lmonferrari the code was very useful, but the doubt arose to download without unzipping, is possible? thank you

  • Speak Saul, good afternoon! I added an update to the answer with some modifications. Hug!

  • 1

    perfect @lmonferrari worked, I went beyond, I ended up searching the file name in the "headers" page, follows my answer below.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.