Doubt how to scrape data like Python using Beautifulsoup <Table>

Asked

Viewed 342 times

0

I’m beginner and I’m trying to get a table of the website of the portal of transparency, but I’m not getting only comes to tag with no data. When I open the developer tool I visualize the data I want that are the states and the pass value, more when I give a Ctrl+u for the cogido the dice does not appear only tag,may be confused more have the images below.

when I seek to tag in the python she appears with nothing inside as when I look at the code of the page giving a Ctrl+u, what I’m doing wrong ?

inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

import requests
from bs4 import BeautifulSoup

page = requests.get("http://www.portaltransparencia.gov.br/funcoes/12- 
educacao?ano=2018")
soup = BeautifulSoup(page.content, 'html.parser')
p = soup.find('table', class_='tabelaPrimeiroNivel')
forecast_items = p.find_all('tbody')
print(forecast_items)

1 answer

1

Your problem is that the data are not on the page. When accessing the page, a blank skeleton is loaded from where the data should be, and then the page runs javascript code that makes a separate request to the server and then creates these elements dynamically, afterward that the page was loaded.

Like the Beautifulsoup does not execute javascript, you only have access to the page still empty, so it is not possible to take this data with it.

You can check what I said by opening the developer tool and loading the page with the tab "Network" (network) selected - you will see that the page makes several requests where the other dynamic data comes from.

There are two possible solutions:

  1. Use the Selenium - is a python library that allows you to control a real browser, like firefox or Chrome. As real browsers run javascript, you’ll be able to get the data this way, however, this solution is much less efficient as it needs to load a heavy browser and several elements of the page that don’t matter.

  2. Read the page, examine the code and the requests it makes via javascript, and then write python code manually that mimics these requests. This method usually takes more work, but the result is more efficient, since it will have a code capable of doing only what is necessary to fetch the data you want.

Lucky for you, the transparency portal has a API - An interface for programmers to rescue the data, without having to parse the pages. The explanation of use is in this link http://www.portaltransparencia.gov.br/api-de-dados

An example:

r = requests.get('http://www.portaltransparencia.gov.br/funcoes/12/mapa', 
    params={'ano': '2018'})
print(r.json())

Upshot:

[{'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'ACRE',
  'siglaUF': 'AC',
  'valor': 271382820.59},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'ALAGOAS',
  'siglaUF': 'AL',
  'valor': 762900876.36},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'AMAPÁ',
  'siglaUF': 'AP',
  'valor': 202949699.19},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'AMAZONAS',
  'siglaUF': 'AM',
  'valor': 704229532.02},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'BAHIA',
  'siglaUF': 'BA',
  'valor': 1800232448.53},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'CEARÁ',
  'siglaUF': 'CE',
  'valor': 1317203323.08},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'DISTRITO FEDERAL',
  'siglaUF': 'DF',
  'valor': 1702869722.04},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'ESPÍRITO SANTO',
  'siglaUF': 'ES',
  'valor': 1005278642.49},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'GOIÁS',
  'siglaUF': 'GO',
  'valor': 1300024908.65},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'MARANHÃO',
  'siglaUF': 'MA',
  'valor': 904528606.79},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'MATO GROSSO',
  'siglaUF': 'MT',
  'valor': 848977509.3},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'MATO GROSSO DO SUL',
  'siglaUF': 'MS',
  'valor': 812220959.61},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'MINAS GERAIS',
  'siglaUF': 'MG',
  'valor': 5612411096.05},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'PARANÁ',
  'siglaUF': 'PR',
  'valor': 1913617246.26},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'PARAÍBA',
  'siglaUF': 'PB',
  'valor': 1626800821.69},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'PARÁ',
  'siglaUF': 'PA',
  'valor': 1502290653.09},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'PERNAMBUCO',
  'siglaUF': 'PE',
  'valor': 1793890169.14},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'PIAUÍ',
  'siglaUF': 'PI',
  'valor': 752510959.88},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'RIO DE JANEIRO',
  'siglaUF': 'RJ',
  'valor': 5077770452.72},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'RIO GRANDE DO NORTE',
  'siglaUF': 'RN',
  'valor': 1417979764.75},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'RIO GRANDE DO SUL',
  'siglaUF': 'RS',
  'valor': 4444340585.5},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'RONDÔNIA',
  'siglaUF': 'RO',
  'valor': 334773348.77},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'RORAIMA',
  'siglaUF': 'RR',
  'valor': 226714164.22},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'SANTA CATARINA',
  'siglaUF': 'SC',
  'valor': 1531789135.42},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'SERGIPE',
  'siglaUF': 'SE',
  'valor': 622536740.58},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'SÃO PAULO',
  'siglaUF': 'SP',
  'valor': 1981995537.81},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': 'TOCANTINS',
  'siglaUF': 'TO',
  'valor': 424378306.98},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': '',
  'siglaUF': 'Nacional',
  'valor': 36308921677.23},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': '',
  'siglaUF': 'Centro-Oeste',
  'valor': 0.0},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': '',
  'siglaUF': 'Sul',
  'valor': 175660334.04},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': '',
  'siglaUF': 'Nordeste',
  'valor': 248671827.99},
 {'codigoIBGE': '',
  'nomeMunicipio': '',
  'nomeUF': '',
  'siglaUF': 'Sudeste',
  'valor': 0.0}]
  • thanks for the reply

Browser other questions tagged

You are not signed in. Login or sign up in order to post.