Navigate between pages from a web page bar

Asked

Viewed 102 times

3

How to browse pages that are in a web page bar?

Specific case: When performing a query on the TCM-Ba website, on the page that records the expenses of municipalities, it is possible to access some information. It turns out that the TCM page limits the result of each page to 20 records (lines). If the user wants to have access to other data, he has to navigate through a bar with subsequent pages (see image below):

inserir a descrição da imagem aqui

Link: Here

inserir a descrição da imagem aqui

Link: Here

It is possible to notice the link above that to access the page, the GET protocol is used. When browsing between pages, it turns out that the only variable that changes is "pag=". The problem is that each municipality + entity (city hall or chamber) will present a varied number of pages.

I even imagined the possibility to create a loop to scrape this data... then, when the Web Scraping identified that it would be the last page... it would jump (next) to the next counter of the loop (in which case it would be the next municipality).

To identify this last page, I thought to put an error handling if the page number was invalid, eg: 27

However, what appears is this page (image below). I also thought of putting an IF to identify if the table TAG (#tableResult) appeared or not... but, even on a page that has no results, the tag appears (image below).

inserir a descrição da imagem aqui

Link: Here

1 answer

1


There are several ways of solution, in this check if there is content in tags <td>. Note that the output of Tabela1 is 180 (total table data) in size and the table2 is 0. So you can use the equality by zero to break your loop.

library(XML)

# padrão de seleção do html
xp <- "//*[@id='tabelaResultado']//td"

# exemplo página 21
site <- paste0("http://www.tcm.ba.gov.br/consulta-de-despesas/", 
               "?txtEntidade=Camara%20Municipal%20de%20SANTO%20ANTONIO%20DE%20JESUS&ano=2017",
               "&favorecido=&entidade=763&orgao=&orcamentaria=&despesa=&recurso=&desp=P",
               "&dtPeriodo1=&dtPeriodo2=&pg=21")
h <- htmlParse(site)
tabela1 <- xpathSApply(h, path = xp)


# exemplo página 27

site <- paste0("http://www.tcm.ba.gov.br/consulta-de-despesas/", 
               "?txtEntidade=Camara%20Municipal%20de%20SANTO%20ANTONIO%20DE%20JESUS&ano=2017",
               "&favorecido=&entidade=763&orgao=&orcamentaria=&despesa=&recurso=&desp=P",
               "&dtPeriodo1=&dtPeriodo2=&pg=27")

h <- htmlParse(site)
tabela2 <- xpathSApply(h, path = xp)
  • Excellent!!! Thank you very much. It worked perfectly!!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.