Join multiple files from a folder in R

Asked

Viewed 1,947 times

4

I’m trying to piece together several xlsx files in R. For this, first I open the following libraries and use the programming:

library(readxl)
library(plyr)
larquivos<-list.files("C:\\Users\\tomas.veiga\\Documents\\Financeiro\\dados",full.names=TRUE)
arquivos <- lapply(larquivos, function(x) read_excel(path = x, sheet = 1))

As the files have difference in number of lines, I try to do the following:

extracontab<-data.frame(rbind.fill(arquivos[c(1:12)]))

But a mistake comes back:

Error in vector(type, length) :   vector: cannot make a vector of mode 'NULL'.

I try to do it this way too:

extracontab<-data.frame(rbind.fill(arquivos[c(1:12)]))

But nothing works. What should I do? Thank you.

  • 1

    Is there any way to get a reproducible example of the data as the same problem? I tried here with any tables and the line extracontab<-data.frame(rbind.fill(arquivos[c(1:12)])) works normally (despite the [c(1:12] redundant). This indicates that possibly the files were not read correctly, or else the problem has to do with the structure of the tables.

  • The same thing happened to me.

2 answers

5


I was able to reproduce your error using a spreadsheet with empty column name. That’s probably your problem.

To solve it I would do so:

arquivos <- lapply(larquivos, function(x) {
  df <- read_excel(path = x, sheet = 1)
  names(df)[names(df) == ""] <- "x__"
  return(df)
})

And then I’d call rbind.fill the same way you are doing it. The rows I added just change the name of the empty columns to "x__". You may prefer another action like deleting them, putting another name, etc. To do this, simply modify the function called by lapply.

  • Daniel, I did it differently, as I explained in the answer below. As its shape is much more elegant than removing a column, I wanted to better understand what was done. What does this function mean?

  • It tells what value the function retorna after it has been executed. The above section would continue to be the same if it changed return(df) for df. The difference is that return allows the function to return at other points, and not necessarily at the last line.

0

I did it that way:

larquivos<-list.files("C:/Users/tomas.veiga/Documents/Financeiro/dados",full.names=TRUE)
arquivos <- sapply(larquivos, read_excel,simplify = F)
dados <- rbindlist(arquivos, idcol = "id")

And then I removed the columns I wanted.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.