0
Good evening, I’m trying to pull data from google scholar with Rselenium but I’m having a hard time getting the information from the magazines I’m looking for.
Playing the code below:
#Primeiro construo o data frame que de revistas que quero puxar
teste <- c("Revista de Direito Administrativo", "ARSP. ARCHIV FUR RECHTS- UND SOZIALPHILOSOPHIE",
"ANTITRUST BULLETIN")
Just after I run the function below:
get_journal <- function(teste) {
remDr$navigate("https://scholar.google.com/citations?view_op=top_venues&hl=pt-BR&vq=en")
final <- c()
for(i in 1:length(teste)) {
remDr$refresh()
Sys.sleep(1)
address_element <- remDr$findElement(using = "class", value = "gs_in_txt")
address_element$sendKeysToElement(list(teste[i]))
button_element <- remDr$findElement(using = "class", value = "gs_wr")
button_element$clickElement()
Sys.sleep(3)
out <- remDr$findElement(using = "class", value = "gsc_mvt_n")
output <- out$getElementText()
final <- c(final, output)
}
return(final)
}
vector_out <- get_journal(teste)
data.frame(teste, purrr::flatten_chr(vector_out)) %>%
dplyr::mutate(., vector_out = stringr::str_remove_all(vector_out, "\\(|\\)")) %>%
tidyr::separate(., vector_out, into = c("H5", "MedianaH5"), sep = ",")
But return me a list with NA
(example below):
teste purrr..flatten_chr.vector_out. H5 MedianaH5 1 Revista de Direito Administrativo Índice h5 Índice h5 <NA> 2 ARSP. ARCHIV FUR RECHTS- UND SOZIALPHILOSOPHIE Índice h5 Índice h5 <NA> 3 ANTITRUST BULLETIN Índice h5 Índice h5 <NA> Warning message: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows [1, 2, 3].
Anyone can help?