3
Using Mozilla Firefox, could anyone tell you how to make Scrapping in Google Scholar? Where to start?
3
Using Mozilla Firefox, could anyone tell you how to make Scrapping in Google Scholar? Where to start?
3
I will publish here my web code scraping for google scholar, using keywords. With this code I was able to obtain information such as: Title, Authors, Abstract and number of citations; from the google scholar page. The code goes based on information obtained by the following R programmers: Kay Cichini, Gabor Pozsgai and Rogério Barbosa.
library(RSelenium)
library(xlsx)
checkForServer() #baixando um servidor do Selenium (so precisa fazer uma vez)
startServer() # mantenha essa janela aberta
firefox_con <- remoteDriver(remoteServerAddr = "localhost",
port = 4444,
browserName = "firefox"
)
firefox_con$open() # mantenha essa janela aberta
url <- paste("http://scholar.google.com/scholar?q=", "+key+word", "&num=1&as_sdt=1&as_vis=1",
sep = "")
firefox_con$navigate("http://scholar.google.com")
busca <- firefox_con$findElement(using = "css selector", value = "#gs_hp_tsi")
Keyword <- busca$sendKeysToElement(list("key word", key="enter"))
pages.max <- 10
scraper_internal <- function(x) {
doc <- htmlParse(url, encoding="UTF-8")
tit <- xpathSApply(x, "//h3[@class='gs_rt']", xmlValue)
aut <- xpathSApply(x, "//div[@class='gs_a']", xmlValue)
abst <- xpathSApply(x, "//div[@class='gs_rs']", xmlValue)
others <- xpathSApply(x, "//div[@class='gs_fl']", xmlValue)
dat <- data.frame(TITLE = tit, AUTHORS = aut, ABSTRACT = abst, CITED = others)
}
for (i in seq(1,pages.max*10,10)){
baseURL <- paste("http://scholar.google.com/scholar?start=", i, "&q=", "+key+word",
"&hl=en&lr=lang_en&num=10&as_sdt=1&as_vis=1",
sep = "")
firefox_con$navigate(baseURL)
pagina <- xmlRoot(htmlParse(
unlist(firefox_con$getPageSource())
))
result <- scraper_internal(pagina)
write.xlsx(result, "C:/KEYWORD.xlsx",
sheetName = paste("keyword", i), row.names=TRUE, col.names = TRUE, append=TRUE)
}
Karla, I moved the content that was in the answer that Tony did in his place here.
Browser other questions tagged r web-application google
You are not signed in. Login or sign up in order to post.
Karla, hello! Thank you for sharing. Can I suggest a change? It might be interesting to change the format to a question/answer pair. This helps those who have a question that is answered by its content ("How to do Scrapping in Google Scholar?") while adjusting to the format of Stack Overflow.
– OnoSendai
The site has the format Question/Answer. Preferably with some current code. In your case, I put an answer to your own question.
– Tony
You should put your answer, then I will delete my "answer". I did just to indicate how it will look.
– Tony