1
I need to get a table of the transparency portal to then write to the database. I am using Beautiful Soup.
I can’t bring in the request
the part that has the data and consequently no tag that I look for works.
import requests
website_url = requests.get('http://www.portaltransparencia.gov.br/despesas/favorecido?paginacaoSimples=true&tamanhoPagina=&offset=&direcaoOrdenacao=asc&colunasSelecionadas=data%2CdocumentoResumido%2ClocalizadorGasto%2Cfase%2Cespecie%2Cfavorecido%2CufFavorecido%2Cvalor%2Cug%2Cuo%2Corgao%2CorgaoSuperior%2Cgrupo%2Celemento%2Cmodalidade&de=20%2F03%2F2019&ate=20%2F03%2F2019&faseDespesa=1&ordenarPor=fase&direcao=asc').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())
Tabela = soup.find('table',{'class':'dataTable no-footer'})
Nothing comes. In the first print
no longer comes the table with the data.
Does anyone know what I’m doing wrong? First time I do Scraping.
The table is not populated by the server when loading the page, it is populated by AJAX by javascript. If you access the url where ajax takes this information is easier and comes in JSON ready to use the data.
– fernandosavio
Not to mention that if you get JSON you don’t need to use Beautifulsoup, just use
json.loads
that the data will be ready.– fernandosavio
Excellent tip Fernando. I will change my focus and work on JSON. Thank you very much.
– wmoura12
Now my war is to turn this Json into me Dataframe with Pandas. I don’t know if I have to do some normalization or conversion because it doesn’t recognize the structure. I need to save to file first?
– wmoura12
Before the data has the quantity records:{"draw":0,"recordsTotal":9223372036854775807,"recordsFiltered":9223372036854775807,"date":[{"date":"21/03/2019","document":"110001000012019NE000237". In the result comes out in front of the data that are all in a column.
– wmoura12