Can’t do this with Selenium (as far as I know and searched), what you can do is take the link URL and download it via Python directly, for example:
If it’s Python 2.x
I couldn’t test, I don’t use python2
from os.path import basename
from urllib import urlretrieve
...
ExportCSV = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.XPATH, '//div/div[2]/div/a')))
url = ExportCSV.get_attribute('href')
# Remove querystring (se houver)
arquivo = url[:url.find('?', 0)]
#remove espaços em branco e barras
arquivo = arquivo.strip().strip('/')
# Pega somente o nome
arquivo = basename(arquivo)
urlretrieve(url, nome)
driver.quit()
If it’s Python 3.x
from os.path import basename
from urllib import request
...
ExportCSV = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.XPATH, '//div/div[2]/div/a')))
url = ExportCSV.get_attribute('href')
# Remove querystring (se houver)
arquivo = url[:url.find('?', 0)]
#remove espaços em branco e barras
arquivo = arquivo.strip().strip('/')
# Pega somente o nome
arquivo = basename(arquivo)
with request.open(url) as response, open(arquivo, 'wb') as file:
file.write(response.read())
driver.quit()
Sessions and cookies
The previous examples are basic, serve more to understand how you can try to solve, but it is important to note that website use cookies and sometimes links are only available through them, because the site by using a anti-crsf (with cookie/session) or session, which would prevent you from accessing the link via Python, however it is possible to circumvent, Selenium itself offers the method driver.get_cookies()
, it returns the cookies of the current website and the path current (if there are exclusive cookies set for other paths in the same domain I believe this method does not return them, it is similar to document.cookie
browser Javascript), when using it will return an object, something like (this example returned from google’s website, omitted some sensitive cookies):
[{'domain': 'google.com', 'expiry': 1569293037.639768, 'httpOnly': False, 'name': '1P_JAR', 'path': '/', 'secure': False, 'value': '2019-08-25-02'}, {'domain': 'google.com', 'expiry': 1571885037, 'httpOnly': False, 'name': 'OGPC', 'path': '/', 'secure': False, 'value': '19013527-1:'}, {'domain': 'www.google.com', 'expiry': 1566787437, 'httpOnly': False, 'name': 'UULE', 'path': '/', 'secure': False, 'value': '...'}, {'domain': 'google.com', 'expiry': 1582512236.772455, 'httpOnly': True, 'name': 'NID', 'path': '/', 'secure': False, 'value': '...'}]
So having the object containing cookies you must now pass the values to one of the functions of urllib
(or another lib of your preference, after all in Python has more than one lib, native or not to the service).
To solve this you can use:
from http.cookiejar import Cookie, CookieJar
And set the "cookie jar" in urllib like this:
jar = CookieJar()
request_cookie = Cookie(0, cookie_name, cookie_value, port, port_specified, domain,
domain_specified, domain_initial_dot, path, path_specified,
secure, expires, discard, comment, comment_url, rest, rfc2109)
jar.set_cookie(request_cookie)
opener = request.build_opener(request.HTTPCookieProcessor(jar))
As soon as possible I will put a functional example
This is very interesting, a simple solution.
– Spencer Melo