How to automatically download PDF to Selenium? With Python

Asked

Viewed 2,447 times

5

I am using Python with the Selenium webdriver to automate the download of multiple PDF files. I get the PDF preview window. And now I would like to automatically download the file without popping up the window (save as).

inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

I am trying to download the file with the code below, but still keeps popping the window (Save as).

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", r"C:\downpdf")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")

fp.set_preference("pdfjs.disabled", "true")
btndownpdf = firefox.find_element_by_xpath('//*[@id="download"]').click()

2 answers

0

Selenium is an excellent tool, but I don’t think it is "usual" to download, in which case I recommend using another library, like urllib (built-in).

Follow an example:

from urllib.request import urlopen
def download_file(url):
    response = urlopen(url)
    with open("file_down.pdf", "wb") as file:
        file.write(response.read())
        print("Feito!!!")
def main():
    download_file("http://www.secom.ba.gov.br/arquivos/File/MANUALMARCAGOVERNO2015.pdf")
main()

0

The problem is that it depends on the site, not always the servers are configured to return the mimetype application/pdf for Pdfs.

Just check, through your browser’s developer tools, which exact mimetype is being returned to that file, and then use it in your browser’s configuration .set_preference()

Another detail is that this configuration needs to be done before the creation of the driver:

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", r"C:\downpdf")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") # ou outro
fp.set_preference("pdfjs.disabled", "true")

firefox = webdriver.Firefox(firefox_profile=fp)
# .... codigo para chegar ate o elemento: firefox.get(url) etc
btndownpdf = firefox.find_element_by_xpath('//*[@id="download"]').click()
  • I checked mimetype, and it’s the application/pdf.

  • @Time I edited the answer, has the order of the elements also

Browser other questions tagged

You are not signed in. Login or sign up in order to post.