Web scraping python running javascript on the CEF website

Question

Web scraping python running javascript on the CEF website

Asked 7 years, 1 month ago

Viewed 1,839 times

1

The CEF (Caixa Econômica Federal) changed the way it displays the results of the lotteries on its website, before I could get the results that all came in HTML via webscraping relatively easily using Bealtfulsoup, but now these results are displayed running via javascript browser. I searched the net for some things but could not understand the process itself. If someone can help me, I appreciate.

If possible post the code you have tried.

– Valdeir Psr

2018/06/06 at 04:52

3 answers

Browser other questions tagged python python-2.7 web-scraping

You are not signed in. Login or sign up in order to post.

by WagnerAlbJr • 81 points · Answer 1 · 2018-06-06T02:32:51+00:00

The box site itself makes available the download of all results in html format. On this page below it is possible to download http://loterias.caixa.gov.br/wps/portal/loterias/landing/megasena , but if it is for didactic reasons, you have two alternatives, one is to explore the endpoint that javascript seeks

http://loterias.caixa.gov.br/wps/portal/loterias/landing/megasena/!ut/p/a1/04_Sj9CPykssy0xPLMnMz0vMAfGjzOLNDH0MPAzcDbwMPI0sDBxNXAOMwrzCjA0sjIEKIoEKnN0dPUzMfQwMDEwsjAw8XZw8XMwtfQ0MPM2I02-AAzgaENIfrh-FqsQ9wNnUwNHfxcnSwBgIDUyhCvA5EawAjxsKckMjDDI9FQE-F4ca/dl5/d5/L2dBISEvZ0FBIS9nQSEh/pw/Z7_HGK818G0KO6H80AU71KG7J0072/res/id=buscaResultado/c=cacheLevelPage/=/?timestampAjax=1528262624920

That the only parameter is the timestamp at the end.

Alternatively, use the Selenium library and render javascript and then pass the already rendered javascript to the Beautiful Soup for example.

by Lacobus • **13,510** points · Answer 2 · 2018-06-06T16:44:32+00:00

The URL "http://loterias.caixa.gov.br/wps/portal/loterias" still available in its content HTML the latest lottery results, and Oce can extract them as follows:

import requests
from bs4 import BeautifulSoup

req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )

soup = BeautifulSoup( req.content, "html.parser" )

ul = soup.findAll( "ul", class_="resultado-loteria mega-sena" )

for li in ul[0].findAll( "li" ):
    print( li.text )

It follows a function capable of recovering the results of Mega Sena using the BeautifulSoup:

import requests
from bs4 import BeautifulSoup

def obterDezenasMegaSena():
    try:
        req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )
        soup = BeautifulSoup( req.content, "html.parser" )
        ul = soup.findAll( "ul", class_="resultado-loteria mega-sena" )
        return [ int(li.text) for li in ul[0].findAll( "li" ) ]
    except:
        return None

print( obterDezenasMegaSena() )

Exit:

[3, 6, 11, 27, 28, 46]

The same can be done to extract the dozens drawn from Quina:

import requests
from bs4 import BeautifulSoup

def obterDezenasQuina():
    try:
        req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )
        soup = BeautifulSoup( req.content, "html.parser" )
        ul = soup.findAll( "ul", class_="resultado-loteria quina" )
        return [ int(li.text) for li in ul[0].findAll( "li" ) ]
    except:
        return None

print( obterDezenasQuina() )

Exit:

[21, 25, 40, 66, 67]

by Lacobus • **13,510** points · Answer 3 · 2018-06-07T15:14:19+00:00

You can use the website "http://www.loteriaseresultados.com.br/" to extract all information about all CEF lottery draws using the BeautifulSoup, look at you:

import requests
from bs4 import BeautifulSoup

def obterPremiacaoMegaSena( soup, premio ):
    td = soup.find( 'th', text=lambda x: x.startswith(premio)).find_parent('tr').findAll("td")
    if( td[1].text == "-" ):
        return { "Tipo" : premio, "QtdGanhadores" : u"0", "ValorPremio" : u"0,00" }
    else:
        return { "Tipo" : premio, "QtdGanhadores" : td[0].text.split(' ')[0], "ValorPremio" : td[1].text.split(' ')[1] }


def obterResultadoMegaSena( nconcurso ):
    try:
        req = requests.get( "http://www.loteriaseresultados.com.br/megasena/concurso/" + str(nconcurso) )
        soup = BeautifulSoup( req.content, "html.parser" )
        dezenas = [ int(dezena.text) for dezena in soup.findAll( "div", class_="bola bg-success" ) ]
        sena = obterPremiacaoMegaSena( soup, "SENA" )
        quina = obterPremiacaoMegaSena( soup, "QUINA" )
        quadra = obterPremiacaoMegaSena( soup, "QUADRA" )
        return { "Concurso" : nconcurso, "DezenasSorteadas" : dezenas, "Premiacao" : [ sena, quadra, quina ] }
    except:
        return None

print( obterResultadoMegaSena( 2047 ) )

Exit:

{
  'Concurso': 2047,
  'DezenasSorteadas': [1, 18, 19, 29, 44, 54],
  'Premiacao': [ {
                   'ValorPremio': u'0,00',
                   'QtdGanhadores': u'0',
                   'Tipo': 'SENA'
                 },
                 { 
                   'ValorPremio': u'1.002,65',
                   'QtdGanhadores': u'2.390',
                   'Tipo': 'QUADRA'
                 },
                 { 
                   'ValorPremio': u'55.914,69',
                   'QtdGanhadores': u'30',
                   'Tipo': 'QUINA'
                 }
               ]
}