How to join more than two dataframes in R?

Asked

Viewed 81 times

2

I have a list of 12 DF's that I need to put together in a single DF. The structure of all is equal, only differing the name of the second column, so the basic structure is: Column 1 is the date and Column 2 the name of the object.

The DF's have different number of lines for the available dates of each series.

I tried to use the function full_join package dplyr, but only worked with 2 objects at a time. The cbind does not work because data have different line numbers.

Example:

library(dplyr)

x <- data.frame(data = seq.Date(from = as.Date("2009-07-01"), to = as.Date("2015-08-31"), 
                                length.out = as.Date("2015-08-31") - as.Date("2009-07-01")), x = c(1:2252))

y <- data.frame(data = seq.Date(from = as.Date("2009-07-01"), to = as.Date("2017-09-15"), 
                                length.out = as.Date("2017-09-15") - as.Date("2009-07-01")), y = c(1:2998))

z <- data.frame(data = seq.Date(from = as.Date("2010-07-01"), to = as.Date("2017-09-15"), 
                                length.out = as.Date("2010-09-15") - as.Date("2009-07-01")), z = c(1:441))

dados <- full_join(x, y, z)

Returns the following error:

Erro: `by` must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not a `data.frame` object

2 answers

3


The functions *_join, package dplyr, are only set for operations with two objects at a time. Fortunately it is possible to apply them to more than one object simultaneously using the function reduce package purr, which is also part of tidyverse:

library(tidyverse)

x <- data.frame(data = seq.Date(from = as.Date("2009-07-01"), 
                                to = as.Date("2015-08-31"), 
                                length.out = as.Date("2015-08-31") - as.Date("2009-07-01")), 
                x = c(1:2252))

y <- data.frame(data = seq.Date(from = as.Date("2009-07-01"), 
                                to = as.Date("2017-09-15"),
                                length.out = as.Date("2017-09-15") - as.Date("2009-07-01")), 
                y = c(1:2998))

z <- data.frame(data = seq.Date(from = as.Date("2010-07-01"), 
                                to = as.Date("2017-09-15"), 
                                length.out = as.Date("2010-09-15") - as.Date("2009-07-01")), 
                z = c(1:441))

df_final <- list(x, y, z) %>% 
    reduce(full_join, by = "data")

Created on 2020-06-18 by the reprex package (v0.3.0)

  • I noticed that at the time of joining the dates, some data have equal dates, but are appearing with NA.

  • 1

    Try converting the dates to Character before joining. The Date is a "rounded" POSIX display, two equal dates can match different numerical values.

2

To be recorded: the same can be done only with functions of groundwork:

dados <- Reduce(function(a, b) merge(a, b, all = TRUE), list(x, y, z))
  • merge unites two data frames by an identification column (by default, any name common to the two); the option all = TRUE keeps all ids.
  • Reduce applies a function to a list sequentially.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.