Manipulation of CSV in R

Question

Manipulation of CSV in R

Asked 6 years, 10 months ago

Viewed 146 times

0

I have several files with the extension . csv, I need to read them all, however I have a problem where I have the following variety of file.

df1 -> Cabeçalho numérico antes do cabeçalho atual

1    2     3 
nome idade escolaridade
joao 10    6ano
Bia 20     faculdade

df 2 -> Cabeçalho sem numérico

nome    idade escolaridade
Joaquim 6     colégio
Andre   1    maternal

df 3 -> Separador #

That’s all I’ve done so far:

filenames = list.files(pattern="*.csv")
 if(is.empty(filenames)==FALSE)
for(c in 1:length(filenames)){

a<- read.table(filenames[c],,header=T, sep=";", dec=",")
}

is.empty(filenames)==FALSE not a good idea. Use !is.empty(filenames).

– Rui Barradas

2018/10/09 at 19:18
what’s the difference ?

– Brenda Xavier

2018/10/09 at 19:19
You must not test variavel == TRUE or variavel == FALSE because the variable already is TRUE or FALSE. Suffice if(variavel) in the case TRUE or deny if(!variavel) in the case FALSE.

– Rui Barradas

2018/10/09 at 19:20
obgda, I’ll make that change ^^

– Brenda Xavier

2018/10/09 at 19:21

1 answer

Browser other questions tagged r csv

You are not signed in. Login or sign up in order to post.

by Carlos Eduardo Lagosta • **5,497** points · Answer 1 · 2018-10-10T04:07:07+00:00

The data.table::fread function is an optimized version of read.table. The Skip option allows you to include a string that marks the beginning of the file. The function is also quite efficient in automatic detection of separators, which is quite useful if your files follow different patterns. Take the example:

library(data.table)

dfEx <- fread(
  input = '# um comentário marcado com "hashtag"
           data de criação: 12/10/2018
           1    2     3 
           nome idade escolaridade
           joao 10    6ano
           Bia 20     faculdade',
  skip = 'nome'  
)

> dfEx
   nome idade escolaridade
1: joao    10         6ano
2:  Bia    20    faculdade

The default fread is to generate an object from the data.table and data.frame classes; you can change this with the option data.table = FALSE.

To read multiple files at once, applying the fread function (or read.table, etc.) over the list is more efficient than using a loop:

listaArquivos <- list.files(pattern = '.csv$')
# 0 $ indica para selecionar nomes que terminam com .csv

if( length(listaArquivos) ) listaDados <- lapply(listaArquivos, fread, skip = 'nome')

This will generate a list in which each element is a data.table (or data.frame). You can merge everything using the data.table:rbindlist function:

dados <- rbindlist(listaDados)