-2
all right? Well, I’m a programming enthusiast, and I took a tutorial to make webscraping. The logic I even understood, however I am facing a problem when one of the data is missing on the site, below I leave the code and my analysis of the problem faced:
import requests
from bs4 import BeautifulSoup
URL = "https://www.classcentral.com/subject/data-science"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
Course = []
Duration = []
Start_Date = []
Offered_By = []
No_Of_Reviews = []
Rating = []
def find_2nd(string, substring):
return string.find(substring, string.find(substring) + 1)
def find_1st(string, substring):
return string.find(substring, string.find(substring))
for i in soup.findAll("span",{'class' : 'text-1 weight-semi line-tight'}):
b = str(i)
#print(b [ find_1st(b,'>')+1 : find_2nd(b,'<') ] )
Course.append(b[find_1st(b,'>')+1:find_2nd(b,'<')])
course = []
for i in Course:
i = i.strip()
print(i)
course.append(i)
# # Num of Reviews
for i in soup.findAll("span",{'class' : 'large-down-hidden block line-tight text-4 color-gray'}):
b = str(i)
print(b[find_1st(b,'>')+1:find_2nd(b,'<')])
No_Of_Reviews.append(b[find_1st(b,'>')+1:find_2nd(b,'<')])
Well, entering the site and doing a search, there is a course that is found without Views. The problem with all this is that when I turn this into a Dataframe, the length error occurs. That is, I can not generate Dataframe, because of this missing value. I didn’t put in the complete code to make it longer, because the rest is working.
Would anyone know how to help me ? How would you make the code understand that this information does not exist, put the value of 0 and continue with the implementation of the code.
Our Krossbow, thank you so much for your help friend. Thank you very much.
– Guilherme Novais