import requests
from bs4 import BeautifulSoup as bs
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'}
url = "http://quizdomilhao.com.br/category/g1"
question_page = requests.get(url, headers = headers )
question_page.encoding = 'utf-8'
soup = bs(question_page.text, 'html.parser')
ul = soup.find_all('ul', {'class':'square'})
lis = [item.find_all('li') for item in ul]
lis = [item for sublista in lis for item in sublista]
aas = [item.find_all('a') for item in ul]
aas = [item for sublista in aas for item in sublista]
text_link = [[item.text, item2['href']] for item, item2 in zip(lis,aas)]
- Importing the libraries
- Creating a header for the site to accept the request
- Making the request using requests
- Using Bs to extract html tags
- Searching the ul with the square class
- Using the return of the previous query to extract only the contents of the li
- Seeking 'a' within ul
- Creating a list of contents
Update to access the reply pages
for item in text_link:
question, link = item
print(question)
print(link)
answer_page = requests.get(link, headers=headers)
answer_page.encoding = 'utf-8'
soup = bs(answer_page.text, 'html.parser')
ul = soup.find('ul', {'class':'square'})
li = ul.find_all('li')
answer = [item.find('strong').text for item in li if item.find('strong')]
print(''.join(answer))
Hello @Imonferrari how do I make the characters appear accentuated?
– Joa Roque
@Joaroque, good morning! I added an update to the question:
question_page.encoding = 'utf-8'
. Hug!– lmonferrari
@Imonferrari, so far so good! Now how do I extract the links contained in li? Whenever I try to error! Because I want to actually save the questions and answers in the database and to get the answer I have to go to the answer page and get the correct answer
– Joa Roque
@Joaroque, I did another update to resolve the link issue. Hug!
– lmonferrari
@Joaroque, I don’t understand. If the doubts of this question have been answered it is better to create a new question with the new questions, so it is easier to understand what you need. What do you think? Hug!
– lmonferrari
@Joaroque, I updated the code because the question page contains more than one square class. Hug!!
– lmonferrari