VBA x IE - collect data from a table

Asked

Viewed 3,006 times

0

I am programming in VBA a collection of information on the website of the central bank. It turns out that the page does not fully load and when the macro ends only a few records were loaded into the spreadsheet. I believe that the page has some event that carries the registration as the user scrolls down the page. Has anyone ever had this problem? How did you solve it? Follow the code.

Sub capDadosTable()

    Dim IE As Object
    Dim e As Object
    Dim el As Object

    Set IE = CreateObject("InternetExplorer.Application")
    With IE
        'página com o formulario a ser preenchido
        .Navigate "https://www3.bcb.gov.br/expectativas/publico/consulta/serieestatisticas"
        .Visible = True

        'aguarda a página carregar por completo
        While .busy Or .ReadyState <> 4
        Application.Wait TimeSerial(Hour(Now), Minute(Now), Second(Now) + 1)
        Wend

        'seleciona o item do combobox 'indicador',pelo Index, neste caso o item 4
        .document.GetElementByID("indicador").selectedindex = 4
        'força rodar o evento do combobox
        .document.all("indicador").FireEvent ("onchange")

        'aguarda a página carregar por completo
        While .busy Or .ReadyState <> 4
        Application.Wait TimeSerial(Hour(Now), Minute(Now), Second(Now) + 1)
        Wend

        'Seleciona o indicador IPCA
        .document.GetElementByID("grupoIndicePreco:opcoes_5").Click

        'seleciona o item do combobox 'calculo' pelo Index, neste caso o 2
        .document.GetElementByID("calculo").selectedindex = 2

        'seleciona a periodicidade anual
        .document.GetElementByID("periodicidade").selectedindex = 2
        'força rodar o evento do combobox
        .document.all("periodicidade").FireEvent ("onchange")

        'aguarda a página carregar por completo
        While .busy Or .ReadyState <> 4
        Application.Wait TimeSerial(Hour(Now), Minute(Now), Second(Now) + 1)
        Wend

        'inclui a data de inicio no combobox tfDataInicial - primeiro dia do ano anterior
        .document.all("tfDataInicial").Value = Format(DateSerial(Year(Now) - 1, 1, 1), "dd/mm/yyyy")

        'inclui a data de hoje no combobox tfDataFinal
        .document.all("tfDataFinal").Value = Format(Now(), "dd/mm/yyyy")

        'testa a lista de anos do combobox AnoInicial, se for igual ao ano corrente, seleciona-o.
        Set e = .document.all("divPeriodoRefereEstatisticas:grupoAnoReferencia:anoReferenciaInicial")

        For Each o In e.Options
        If o.Text = Format(Year(Now), "@") Then
            o.Selected = True
            Exit For
            Set e = Nothing
        End If
        Next

        'testa a lista de anos do combobox AnoFinal, se for igual ao ano corrente, seleciona-o.
        Set e = .document.all("divPeriodoRefereEstatisticas:grupoAnoReferencia:anoReferenciaFinal")

        For Each o In e.Options
        If o.Text = Format(Year(Now), "@") Then
            o.Selected = True
            Exit For
            Set e = Nothing
        End If
        Next


        'clica no botão que gerará a tabela com os resultados
        .document.all("btnConsultar").Click

        'aguarda a página carregar por completo
        While .busy Or .ReadyState <> 4
        Application.Wait TimeSerial(Hour(Now), Minute(Now), Second(Now) + 1)
        Wend

        '**************************************************
            'Carrega os dados da tabela gerada para a Sheets IPCA
            Set sh = Sheets("IPCA")
            Set tabela = IE.document.all.tags("tr")

            linha = 1
            For Each el In tabela

            If el.innertext = "" Then GoTo proximo
            sh.Cells(linha, 1) = el.innertext

            linha = linha + 1

       proximo:
            Next

            End With
End Sub

1 answer

1

Hi, Wanderlei!

First of all, congratulations on the code, very fine stuff!

Incredible as it may seem the problem here is the architecture of the BC page, which only carries the 78 first lines (in your example).

This is to save processing, and avoid Full Table Scan.

I managed to add Scroll control to the page, but need to scroll the frame of the indicators, which I think is not very simple to achieve.

The solution is to increase the last "time" and when the final page appears, you will scroll yourself (which will force the loading of all lines), I suggested 20 seconds for this

So when you sign in to For we’ll have all the TR tags loaded.

I took the liberty of switching to Getelemetsbytagname and columnar date and index.

  .Document.parentWindow.Scroll 0&, 900& 'SCROLL INICIAL
  .Visible = True 
  'ASSIM QUE DER ESSE VISIBLE CORRE, VOCÊ TEM 20 SEGUNDOS PRA DAR SCROLL NO FRAME
   Application.Wait TimeSerial(Hour(Now), Minute(Now), Second(Now) + 20)
        
        '**************************************************
            'Carrega os dados da tabela gerada para a Sheets IPCA
            Set sh = Sheets("IPCA")
            Set tabela = IE.Document.GetElementsByTagName("tr")
            linha1 = 1
            linha2 = 4
            
            sh.Cells(linha2 - 1, 2).Value = "Valor"

            
            For xt = 0 To tabela.Length - 1
                If Len(tabela.Item(xt).innertext) < 2 Then
                'nada a fazer
                
                ElseIf InStr(1, tabela.Item(xt).innertext, "Sistema") > 0 _
                Or InStr(1, tabela.Item(xt).innertext, ",") = 0 Then
                'cabeçalho
                    sh.Cells(linha1, 1).Value = tabela.Item(xt).innertext
                    linha1 = linha1 + 1
                ElseIf InStr(1, tabela.Item(xt).innertext, ",") > 0 Then
                'valor
                    sh.Cells(linha2, 2).Value = tabela.Item(xt).innertext
                    linha2 = linha2 + 1
                End If
            Next
            

            End With

Browser other questions tagged

You are not signed in. Login or sign up in order to post.