How to import data (.csv) to R while maintaining the original format

Asked

Viewed 5,264 times

4

I am trying to import data from Excel (already in .csv format) to R; the values contained in the files to be imported are in the following formats, for example 8509.80...

To import, I am using the command:

variavel=read.table("dados.csv", header=T, dec=",") 

However, when viewing the imported data, I see that the R imported only the part that is not integer of the number (in this case, the R would bring to the value of 8509,80 only the 80).

In this way, I ask you kindly to help me make the import in the correct way, that is, the value of 8509.80 (in English standard would be 8509.80).

2 answers

2


You need to define the field separator. In your case, as should be the European/Brazilian csv, the separator is probably ";".

variavel=read.table("dados.csv", header=T, dec=",", sep=";")

A shortcut to this is to use the function read.csv2:

variavel=read.csv2("dados.csv", header=T)
  • Buddy, it worked, thanks.

2

The base functions for reading tables are sufficient to suit most cases. However, they are relatively slow, and there are faster alternatives if they are many files and/or they are very large, which also have other small advantages.

The package readr was created with exactly the aim of improving the standard functions, in the following points:

  • Arguments have names more consistent with each other (e. g. col_names and col_types and not header and colClasses).

  • They are approximately 10x faster.

  • Show a Progress bar if the reading takes longer than a few seconds.

  • Strings are not transformed into factors by default.

  • Column names are not transformed into "valid" R expressions, that is, columns keep the name identical to the original (even if they start with number, have space, etc).

In this package the functions have similar name to those of the base, replacing the dot with an underscore (_). For example:

#base:
variavel <- read.table("dados.csv", header=T, dec=",", sep=";")
variavel <- read.csv2("dados.csv", header=T)

#readr
library(readr)
variavel <- read_csv2("dados.csv")

Similarly, there are functions read_csv(), read_table(), read_delim(), read_tsv(), read_lines() and read_fwf().

Another alternative, too, is the function fread() package data.table. To fread() is even faster (about 2x) than the package functions readr, and tries to automatically identify the separator, if there are column names, etc. The function fread() has arguments with names equal to the functions of the base, as sep, header and stringsAsFactors. In this example, it would look like this:

library(data.table)
variavel <- fread("dados.csv", sep = ";", header = TRUE)

Depending on the data format, sep and header may be omitted, but in doubt, it is safer to put them explicitly.

Finally, it is important to note that it only makes sense to use these functions if reading performance is a problem, or if the package is already loaded anyway (in the case of data.table). Otherwise, there is no need to load a package to do something that can be done identically on base.

  • Thank you very much for the clarification.

  • +1 It is only worth mentioning the caveat that the fread usually gives encoding problem if the data have too many accented characters etc.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.