Read file with non ascii format [à="<U+00E0>"]

Asked

Viewed 519 times

3

I’m reading a file on R called roubobs.rds. is a proprietary R format and I could not open in excel. I can import the data into a variable but, within the records, the texts are not ascii (Unicode? utf-8?). I have searched to find out what code this is, as well as tried to export as CSV, but it doesn’t work. Does anyone have a light? I need what appears as "armed robbery" to appear as "armed robbery".

The R code you’re reading is this one:

dados <- readRDS("roubo2.rds")

The file can be downloaded here: https://www.dropbox.com/s/yp9r0tln0vwdvej/roubo2.rds?dl=0 I am running Rstudio on Mac. Sessioninfo below.

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (Sierra)
  • I downloaded it and I was able to read the file roubo2.rds with the right accent, also on Mac. I mean, I couldn’t reproduce your problem. However, there are some minor differences between our systems, as you can see in my Section info: R version 3.3.2 (2016-10-31)&#xA;Platform: x86_64-apple-darwin13.4.0 (64-bit)Running under: macOS Sierra 10.12.2. Would it be possible to paste your full Séssion info so I can compare it to mine? For example, the locale configuration is not available in yours, and mine is en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8.

  • This is very curious. Supposedly, the archives. rds are a binary serialization of the table, i.e., more or less a copy of the table state in memory at the time it was saved. That being said, I couldn’t reproduce your problem (linux Mint 18 here).

  • Really curious. At least two people tried to reproduce the problem and failed.

1 answer

2

To export to . csv in the correct encoding just add the argument fileEncoding in function write.csv()

The code would look like this:

dados <- readRDS('roubo2.rds')

write.csv2(dados, 'roubo2.csv', fileEncoding = 'UTF-8')

I also suggest you change the variables in Factor format to Char, since you are working with texts. To do this just use the as.character(). Example:

roubo$tipo <- as.character(roubo$tipo)

When reading a file. csv vc can do this directly by passing the argument stringsAsFactors = FALSE in function read.csv()


Finally, it would be good to use version 3.2 of R, since the vast majority of packages are developed for this version.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.