Reading Table in Python with Beautiful Soup

Asked

Viewed 42 times

1

I need to get a table of the transparency portal to then write to the database. I am using Beautiful Soup.

I can’t bring in the request the part that has the data and consequently no tag that I look for works.

import requests
website_url = requests.get('http://www.portaltransparencia.gov.br/despesas/favorecido?paginacaoSimples=true&tamanhoPagina=&offset=&direcaoOrdenacao=asc&colunasSelecionadas=data%2CdocumentoResumido%2ClocalizadorGasto%2Cfase%2Cespecie%2Cfavorecido%2CufFavorecido%2Cvalor%2Cug%2Cuo%2Corgao%2CorgaoSuperior%2Cgrupo%2Celemento%2Cmodalidade&de=20%2F03%2F2019&ate=20%2F03%2F2019&faseDespesa=1&ordenarPor=fase&direcao=asc').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

Tabela = soup.find('table',{'class':'dataTable no-footer'})

Nothing comes. In the first print no longer comes the table with the data.

Does anyone know what I’m doing wrong? First time I do Scraping.

  • The table is not populated by the server when loading the page, it is populated by AJAX by javascript. If you access the url where ajax takes this information is easier and comes in JSON ready to use the data.

  • Not to mention that if you get JSON you don’t need to use Beautifulsoup, just use json.loads that the data will be ready.

  • Excellent tip Fernando. I will change my focus and work on JSON. Thank you very much.

  • Now my war is to turn this Json into me Dataframe with Pandas. I don’t know if I have to do some normalization or conversion because it doesn’t recognize the structure. I need to save to file first?

  • Before the data has the quantity records:{"draw":0,"recordsTotal":9223372036854775807,"recordsFiltered":9223372036854775807,"date":[{"date":"21/03/2019","document":"110001000012019NE000237". In the result comes out in front of the data that are all in a column.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.