Requests, Beautifulsoup <Tables>

Asked

Viewed 865 times

0

I have a website that wants to extract specific data from a table inserir a descrição da imagem aqui

I want to extract all information that has "PROLONG".

My difficulty is that all tables have the same name in the "class" class="field". how do I extract the data related to the "PROLONG" element code that I made:

import requests
from bs4 import BeautifulSoup

url = requests.get('http://www.praticagem-rj.com.br/')

soup = BeautifulSoup(url.text, "lxml")

list_return = soup.find_all('td', class_='field')

for list_dados in list_return:
    print(list_dados.next_element)

1 answer

1

The code on this site is a mess. What’s happening in your code, is that you’re searching column by column on every row of the table. You may notice that if you read different items from the list list_return they may or may not belong to the same table row.

What you need then, is to take only the interesting lines, and within them look for those that contain "PROLONG"

import requests
from bs4 import BeautifulSoup

url = requests.get('http://www.praticagem-rj.com.br/')

soup = BeautifulSoup(url.text, "lxml")
list_return = soup.find_all('tr',style="background: #e4e4e4;")
for tr in list_return:
temp = tr.find('td', class_='field', text='PROLONG')
if temp != None:
    print(tr)
    print("-----")

From the 'Soup' object, I separated all the rows belonging to the table in 'list_return'. Then I’ll go over each row looking if it has a column with the text='PROLONG'. The lines that do not have this column return None in the search, the ones you have are called in the print. Algoritmo em ação

PS: The key to searching things with Beautifulsoup is to already know what to look for using the 'inspect object' function in the browser. That’s how I figured out how to isolate table lines using style="background: #e4e4e4;"

Browser other questions tagged

You are not signed in. Login or sign up in order to post.