How to extract all td names in order?

Asked

Viewed 233 times

2

I need to extract all the names of people on this site:

Camara.gov.br

I wrote this code in Python3:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

emendas = urlopen("http://www.camara.gov.br/proposicoesWeb/prop_emendas?idProposicao=2122076&subst=0")

bsObje =  BeautifulSoup(emendas, "lxml")

tabelas = bsObje.findAll("tbody", {"class":"coresAlternadas"})

deputados = []

for linha in tabelas:
    deputados.append(linha.select('td')[3].text.strip())

print(deputados)
Resultado -> ['Laura Carneiro', 'André Figueiredo']

It didn’t work. Please, someone knows how to get all the names in order?

2 answers

1


What is the order you want? Alphabetical or in the order in which they were found?

Below I do to cover both scenarios:

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen

req = urlopen('http://www.camara.gov.br/proposicoesWeb/prop_emendas?idProposicao=2122076&subst=0')
soup = bs(req.read(), 'html.parser')

tables_ele = soup.findAll('tbody', {'class': 'coresAlternadas'})
deputados = []
for table_ele in tables_ele:
    for row in table_ele.findAll('tr'):
        cols = row.findAll('td')
        deputados.append(cols[3].text.strip())

print(deputados) # pela ordem encontrados na tabela

Then to order alphabetically you can:

...
deputados = sorted(deputados)

To remove duplicates, (there are many duplicates) and sort alphabetically you can convert the list into one set and order afterwards:

...
deputados = sorted(set(deputados))
  • Thanks, it worked. But if I want to also extract the last batch of ('tbody', {'class': 'coresAlternadas'}), which has 32 more names in the Author column. Do I find just that lot? How?

  • 1

    @Reinaldochaves, ha I realized, two years ago tbody, and the last one is the plenary, you want the names of that one too?

  • 1

    Edited for this @Reinaldochaves.

0

From what I saw in the table, the names themselves there are not ordered..

Then you can capture them normally, and at the end, with the whole captured list sort them via python, using the command sorted in the list, or else only the method .sort() ( without assigning to the variable ) since it operates on the list.

Using the Sorted

>>> deputados = sorted(deputados)
>>> deputados
['André Figueiredo', 'Laura Carneiro']

Using the . Sort()

>>> deputados.sort()
>>> deputados
['André Figueiredo', 'Laura Carneiro']

Browser other questions tagged

You are not signed in. Login or sign up in order to post.