1
I want to collect information about the IMDB Top 250 using the package rvest
. While visiting the page link, the names of the movies appear in their original language, at least in my browser (Firefox 85.0.1, macOS 11.2, both in English):
However, when making web-scraping, my R (4.0.3, locale en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8/C / en_US.UTF-8 / en_US.UTF-8
) lowers the names of the films in Portuguese:
library(rvest)
#> Loading required package: xml2
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
imdb_url <- "https://www.imdb.com/chart/top/"
imdb <-
read_html(imdb_url, options = ) %>%
html_table(fill = TRUE) %>%
.[[1]] %>%
select("Rank & Title", "IMDb Rating")
head(imdb, 5)
#> Rank & Title IMDb Rating
#> 1 1.\n Um Sonho de Liberdade\n (1994) 9.2
#> 2 2.\n O Poderoso Chefão\n (1972) 9.1
#> 3 3.\n O Poderoso Chefão II\n (1974) 9.0
#> 4 4.\n Batman: O Cavaleiro das Trevas\n (2008) 9.0
#> 5 5.\n 12 Homens e uma Sentença\n (1957) 8.9
Created on 2021-02-07 by the reprex package (v1.0.0)
I would like the names of the films to be in English, as in the page I visualize in my browser. What can I do to solve this?
Excellent. Thank you!
– Marcus Nunes