2
Within of this URL, has several links , I have to take the links for the month of June 2017, download them and create a dataframe with all the files in one. But I stopped here at this part, how can I do that? I’m trying to use the urllib library, but without success.
import urllib
from bs4 import BeautifulSoup
from urllib.request import urlopen, urlretrieve
url = 'https://s3.amazonaws.com/video.udacity-data.com/topher/2018/November/5bf32290_turnstile/turnstile.html'
#Criação da variável page com URL no método request.get
page = requests.get(url)
#coleta,analisa e configura como um objeto BeautifulSoup
soup = BeautifulSoup(page.text,'html.parser')
links = soup.find_all('a')
#retorna os todos os links do Junho de 2017 da página
totalArquivos = 0
for link in links:
href= link.get('href')
if href != None and '1706' in href:
totalArquivos += 1
print(totalArquivos)
Thank you very much for the Feedback! If my answer has helped you with your question, mark it as correct. I recommend reading our Help Center about How to ask a good question. After that, please ask a new question with this question and delete this answer. Our philosophy is to limit a question to a single scope.
– Breno
In this case, you should print the variable
href
, or add the links (href
s) in a list.– Breno