Web scraping python running javascript on the CEF website

Asked

Viewed 1,839 times

1

The CEF (Caixa Econômica Federal) changed the way it displays the results of the lotteries on its website, before I could get the results that all came in HTML via webscraping relatively easily using Bealtfulsoup, but now these results are displayed running via javascript browser. I searched the net for some things but could not understand the process itself. If someone can help me, I appreciate.

inserir a descrição da imagem aqui

  • If possible post the code you have tried.

3 answers

1

The box site itself makes available the download of all results in html format. On this page below it is possible to download http://loterias.caixa.gov.br/wps/portal/loterias/landing/megasena , but if it is for didactic reasons, you have two alternatives, one is to explore the endpoint that javascript seeks

http://loterias.caixa.gov.br/wps/portal/loterias/landing/megasena/!ut/p/a1/04_Sj9CPykssy0xPLMnMz0vMAfGjzOLNDH0MPAzcDbwMPI0sDBxNXAOMwrzCjA0sjIEKIoEKnN0dPUzMfQwMDEwsjAw8XZw8XMwtfQ0MPM2I02-AAzgaENIfrh-FqsQ9wNnUwNHfxcnSwBgIDUyhCvA5EawAjxsKckMjDDI9FQE-F4ca/dl5/d5/L2dBISEvZ0FBIS9nQSEh/pw/Z7_HGK818G0KO6H80AU71KG7J0072/res/id=buscaResultado/c=cacheLevelPage/=/?timestampAjax=1528262624920

That the only parameter is the timestamp at the end.

Alternatively, use the Selenium library and render javascript and then pass the already rendered javascript to the Beautiful Soup for example.

  • vlw bro, I can download these files, including the games in the database for statistics. It happens that on the website of the box comes out the data very current in a few hours after the sweepstakes, what does not happen with the files of the downloaded games that are not updated.

  • I’m going to go into Selenium to see if I can render the javascript, I looked more superficially and could not unravel to bring the full result with the numbers drawn, Qtde of winners and value of each track.

  • the link q placed returns the result of the last game in the json 'result":"18-19-44-54-01-29', in the result field. Anyway, if my answer helped you mark it as correct. Thank you.

  • How did you get the link with the result? , really it has everything in a dictionary.

  • In the developer’s own console, as it is ajax, he has to make a get request in srevidor. of a json or xml, in order to be able to populate the table. Look at the form requests made on the page you will find.

0

The URL "http://loterias.caixa.gov.br/wps/portal/loterias" still available in its content HTML the latest lottery results, and Oce can extract them as follows:

import requests
from bs4 import BeautifulSoup

req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )

soup = BeautifulSoup( req.content, "html.parser" )

ul = soup.findAll( "ul", class_="resultado-loteria mega-sena" )

for li in ul[0].findAll( "li" ):
    print( li.text )

It follows a function capable of recovering the results of Mega Sena using the BeautifulSoup:

import requests
from bs4 import BeautifulSoup

def obterDezenasMegaSena():
    try:
        req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )
        soup = BeautifulSoup( req.content, "html.parser" )
        ul = soup.findAll( "ul", class_="resultado-loteria mega-sena" )
        return [ int(li.text) for li in ul[0].findAll( "li" ) ]
    except:
        return None

print( obterDezenasMegaSena() )

Exit:

[3, 6, 11, 27, 28, 46]

The same can be done to extract the dozens drawn from Quina:

import requests
from bs4 import BeautifulSoup

def obterDezenasQuina():
    try:
        req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )
        soup = BeautifulSoup( req.content, "html.parser" )
        ul = soup.findAll( "ul", class_="resultado-loteria quina" )
        return [ int(li.text) for li in ul[0].findAll( "li" ) ]
    except:
        return None

print( obterDezenasQuina() )

Exit:

[21, 25, 40, 66, 67]
  • thanks friend, I really get the result of the numbers drawn, but I need the whole result with the Qtde of drawn and the value for each track and this is only obtained by running the javascript.

  • @MJAGO: I think I get it. See my other answer.

0

You can use the website "http://www.loteriaseresultados.com.br/" to extract all information about all CEF lottery draws using the BeautifulSoup, look at you:

import requests
from bs4 import BeautifulSoup

def obterPremiacaoMegaSena( soup, premio ):
    td = soup.find( 'th', text=lambda x: x.startswith(premio)).find_parent('tr').findAll("td")
    if( td[1].text == "-" ):
        return { "Tipo" : premio, "QtdGanhadores" : u"0", "ValorPremio" : u"0,00" }
    else:
        return { "Tipo" : premio, "QtdGanhadores" : td[0].text.split(' ')[0], "ValorPremio" : td[1].text.split(' ')[1] }


def obterResultadoMegaSena( nconcurso ):
    try:
        req = requests.get( "http://www.loteriaseresultados.com.br/megasena/concurso/" + str(nconcurso) )
        soup = BeautifulSoup( req.content, "html.parser" )
        dezenas = [ int(dezena.text) for dezena in soup.findAll( "div", class_="bola bg-success" ) ]
        sena = obterPremiacaoMegaSena( soup, "SENA" )
        quina = obterPremiacaoMegaSena( soup, "QUINA" )
        quadra = obterPremiacaoMegaSena( soup, "QUADRA" )
        return { "Concurso" : nconcurso, "DezenasSorteadas" : dezenas, "Premiacao" : [ sena, quadra, quina ] }
    except:
        return None

print( obterResultadoMegaSena( 2047 ) )

Exit:

{
  'Concurso': 2047,
  'DezenasSorteadas': [1, 18, 19, 29, 44, 54],
  'Premiacao': [ {
                   'ValorPremio': u'0,00',
                   'QtdGanhadores': u'0',
                   'Tipo': 'SENA'
                 },
                 { 
                   'ValorPremio': u'1.002,65',
                   'QtdGanhadores': u'2.390',
                   'Tipo': 'QUADRA'
                 },
                 { 
                   'ValorPremio': u'55.914,69',
                   'QtdGanhadores': u'30',
                   'Tipo': 'QUINA'
                 }
               ]
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.