This is a problem of character encoding. The Gutenberg Project uses latin1
, but the R
thinks it’s UTF-8
and then gives this error. The good part is that it is easily solvable: just convert from one encoding to another as no information is lost. One way to do this is with the function iconv
of R
pattern. It is super simple to use: just put the original encoding and see which is the final:
library(gutenbergr)
library(dplyr)
teste <- gutenberg_download(c(40409, 31971, 17515, 42942),
meta_fields = "title")
teste %>%
mutate(text=iconv(text, from = "latin1", to = "UTF-8"))
## # A tibble: 74,057 x 3
## gutenberg_id text title
## <int> <chr> <chr>
## 1 17515 A RELIQUIA A Relíq…
## 2 17515 "" A Relíq…
## 3 17515 "" A Relíq…
## 4 17515 "" A Relíq…
## 5 17515 "" A Relíq…
## 6 17515 *A Reliquia* A Relíq…
## 7 17515 "" A Relíq…
## 8 17515 "" A Relíq…
## 9 17515 Decidi compôr, nos vagares d'este verão, na minha quinta… A Relíq…
## 10 17515 (antigo solar dos condes de Landoso) as memorias da minh… A Relíq…
## # … with 74,047 more rows
It is worth noting that I ran this code on a computer that is all in English, running the R
in English and with UTF-8 coding. Maybe your configuration is different and you need to change the arguments used in the above code. Anyway, in case what I put above doesn’t work, turn the command sessionInfo()
on your PC, compare the result with what I got below and try to change the function parameters iconv
to obtain the desired result.
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_0.8.0.1 gutenbergr_0.1.4
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 compiler_3.5.2 magrittr_1.5 assertthat_0.2.0 R6_2.4.0
## [6] pillar_1.3.1 glue_1.3.0 tibble_2.0.1 crayon_1.3.4 Rcpp_1.0.0
## [11] pkgconfig_2.0.2 rlang_0.3.1 purrr_0.3.0
that’s a problem of
Encoding
orFileEnconding
. what is the operating system you are using?– Guilherme Parreira
I’m using Macos, totally in English
– user135517