Change the language of the result of a web-scraping with rvest from the IMDB site

Asked

Viewed 36 times

1

I want to collect information about the IMDB Top 250 using the package rvest. While visiting the page link, the names of the movies appear in their original language, at least in my browser (Firefox 85.0.1, macOS 11.2, both in English):

inserir a descrição da imagem aqui

However, when making web-scraping, my R (4.0.3, locale en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8/C / en_US.UTF-8 / en_US.UTF-8) lowers the names of the films in Portuguese:

library(rvest)
#> Loading required package: xml2
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

imdb_url <- "https://www.imdb.com/chart/top/"

imdb <- 
  read_html(imdb_url, options = ) %>%
  html_table(fill = TRUE) %>%
  .[[1]] %>% 
  select("Rank & Title", "IMDb Rating")

head(imdb, 5)
#>                                               Rank & Title IMDb Rating
#> 1          1.\n      Um Sonho de Liberdade\n        (1994)         9.2
#> 2              2.\n      O Poderoso Chefão\n        (1972)         9.1
#> 3           3.\n      O Poderoso Chefão II\n        (1974)         9.0
#> 4 4.\n      Batman: O Cavaleiro das Trevas\n        (2008)         9.0
#> 5       5.\n      12 Homens e uma Sentença\n        (1957)         8.9

Created on 2021-02-07 by the reprex package (v1.0.0)

I would like the names of the films to be in English, as in the page I visualize in my browser. What can I do to solve this?

1 answer

2


Can use httr::add_headers to specify the desired language:

library(rvest)
library(httr)
library(dplyr)

imdb <-
  paste0(imdb_url, "/textlist") %>%
  html_session(add_headers("Accept-Language" = "en")) %>%
  read_html() %>%
  html_table(fill = TRUE) %>%
  .[[1]] %>%
  select("Rank & Title", "IMDb Rating")

> head(imdb)
                                        Rank & Title IMDb Rating
1 1.\n      The Shawshank Redemption\n        (1994)         9.2
2            2.\n      The Godfather\n        (1972)         9.1
3   3.\n      The Godfather: Part II\n        (1974)         9.0
4          4.\n      The Dark Knight\n        (2008)         9.0
5             5.\n      12 Angry Men\n        (1957)         8.9
6         6.\n      Schindler's List\n        (1993)         8.9
  • Excellent. Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.