Downlad books in Portuguese gutenbergr

Asked

Viewed 83 times

3

I am trying to download books from the Gutemberg project using the "gutenbergr" package. I use the following command:

teste <- gutenberg_download(c(40409,    31971, 17515, 42942),
                              meta_fields = "title")

The books are normally downloaded, but the characters are all bent, especially the accents. The image below shows. Is there a way to fix this or should I download one by one and join? Thanks!

Erro de caracteres

  • 1

    that’s a problem of Encoding or FileEnconding. what is the operating system you are using?

  • I’m using Macos, totally in English

1 answer

3


This is a problem of character encoding. The Gutenberg Project uses latin1, but the R thinks it’s UTF-8 and then gives this error. The good part is that it is easily solvable: just convert from one encoding to another as no information is lost. One way to do this is with the function iconv of R pattern. It is super simple to use: just put the original encoding and see which is the final:

library(gutenbergr)
library(dplyr)

teste <- gutenberg_download(c(40409, 31971, 17515, 42942),
                            meta_fields = "title")

teste %>% 
  mutate(text=iconv(text, from = "latin1", to = "UTF-8"))

## # A tibble: 74,057 x 3
##    gutenberg_id text                                                      title   
##           <int> <chr>                                                     <chr>   
##  1        17515 A RELIQUIA                                                A Relíq…
##  2        17515 ""                                                        A Relíq…
##  3        17515 ""                                                        A Relíq…
##  4        17515 ""                                                        A Relíq…
##  5        17515 ""                                                        A Relíq…
##  6        17515 *A Reliquia*                                              A Relíq…
##  7        17515 ""                                                        A Relíq…
##  8        17515 ""                                                        A Relíq…
##  9        17515 Decidi compôr, nos vagares d'este verão, na minha quinta… A Relíq…
## 10        17515 (antigo solar dos condes de Landoso) as memorias da minh… A Relíq…
## # … with 74,047 more rows

It is worth noting that I ran this code on a computer that is all in English, running the R in English and with UTF-8 coding. Maybe your configuration is different and you need to change the arguments used in the above code. Anyway, in case what I put above doesn’t work, turn the command sessionInfo() on your PC, compare the result with what I got below and try to change the function parameters iconv to obtain the desired result.

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.3
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_0.8.0.1    gutenbergr_0.1.4
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5 compiler_3.5.2   magrittr_1.5     assertthat_0.2.0 R6_2.4.0        
##  [6] pillar_1.3.1     glue_1.3.0       tibble_2.0.1     crayon_1.3.4     Rcpp_1.0.0      
## [11] pkgconfig_2.0.2  rlang_0.3.1      purrr_0.3.0
  • 2

    Thank you very much! I also run on Macos on a computer totally in English! Your reply helped a lot!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.