If you look at the ASN links you want to capture have the same pattern "https://bgpview.io/asn/+númedo da asn"
. So I would start from that principle and capture all the asn numbers and then do a search on each link. I will put an example code here but only using the Beautifulsoup
Step 1
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
import re
Here I use the request to access the link
Step 2
source = urllib.request.urlopen('https://bgpview.io/reports/countries/BR').read()
soup = BeautifulSoup(source,'lxml')
Now I will access the table and save in a dataframe
Step 3
table = soup.find('table', attrs={"id":"country-report"})
table_rows = table.find_all('tr')
titulo = table.find_all('th')
colunas = []
for col in titulo:
colunas.append(col.text)
l = []
for tr in table_rows:
td = tr.find_all('td')
row = [tr.text for tr in td]
l.append(row)
df = pd.DataFrame(l, columns=colunas)
df = df.dropna()
Now I will save the column of ASN in a list and pass a regex so that it is only the numbers that we need to put in the links
Step 4
asn = df['ASN'].tolist()
asn_number = []
for a in asn:
num = re.sub("[A-Za-z]", "", a)
asn_number.append(num)
Now let’s do a for and pass each link number and save the extraction result in a dataframe:
Step 5
from http.client import IncompleteRead
l = list()
for asn in asn_number:
row = []
try:
source = urllib.request.urlopen('https://bgpview.io/asn/'+str(asn)).read()
except IncompleteRead:
continue
soup = BeautifulSoup(source,'lxml')
ext=soup.find('div', attrs={"id":"content-info"})
col = ext.find_all('h4')
colunas = [tr.text.replace(":", "") for tr in col]
span = ext.find_all('span')
em = ext.find_all('em')
for tr in span:
row.append(tr.text.replace("\n", ""))
for tr in em:
row.append(tr.text.replace("\n", ""))
dic = dict(zip(colunas, row))
l.append(dic)
df = pd.DataFrame(l, columns=colunas)
Remembering that you can follow this same line of reasoning with Selenium as well, but it is much heavier.
Cool worked!!! but how do I save this list in csv?
– user201087
In the last for vc you can create two lists one with the name of the columns and others with the values and turn it into a dictionary and using pandas you build the dataframe.
– Juan Caio
You use the same idea I showed you in step three only by turning the content into a dictionary and moving to the list!
– Juan Caio