Scraping with R - xpathSApply returning a list of 0

Asked

Viewed 98 times

2

I’m learning to read XML data in R.

I wanted to extract the information of Brazilian football (championship name, game owner, result, etc.) from this site: https://www.terra.com.br/esportes/equipes/sao-paulo/lista-de-jogos with the XML package. My code is as follows::

[1] fileUrl <- 'http://www.terra.com.br/esportes/equipes/sao-paulo/lista-de-jogos'
[2] doc <- htmlTreeParse(fileUrl, useInternalNodes = T)
[3] championship <- xpathSApply(doc,"//h3[@class='header-matches']", xmlValue)

However, the function xpathSApply ends up returning me a list of 0. Anyone knows why?

I tested the code on the Globo Esporte website and was successful, I imagine it has to do with the XML code itself of the Terra site.

  • I parsed the answer and saw that everything is in javascript, unless mistaken. You will not get via xpath. Try to use regex: httr::GET(url) %>% httr::content("text") %>% stringr::str_extract_all(....).

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.