2
I’m learning to read XML data in R.
I wanted to extract the information of Brazilian football (championship name, game owner, result, etc.) from this site: https://www.terra.com.br/esportes/equipes/sao-paulo/lista-de-jogos with the XML package. My code is as follows::
[1] fileUrl <- 'http://www.terra.com.br/esportes/equipes/sao-paulo/lista-de-jogos'
[2] doc <- htmlTreeParse(fileUrl, useInternalNodes = T)
[3] championship <- xpathSApply(doc,"//h3[@class='header-matches']", xmlValue)
However, the function xpathSApply ends up returning me a list of 0. Anyone knows why?
I tested the code on the Globo Esporte website and was successful, I imagine it has to do with the XML code itself of the Terra site.
I parsed the answer and saw that everything is in javascript, unless mistaken. You will not get via xpath. Try to use regex:
httr::GET(url) %>% httr::content("text")
%>% stringr::str_extract_all(....).– José