How to merge multiple dates frames into one

Asked

Viewed 14,983 times

6

I want to create a data frame as a junction of 4 other data frames. I was able to do this using such commands:

ZHO<-as.data.frame.matrix(zho)
ZHO
ZES<-as.data.frame.matrix(zes)
ZES
ZRE<-as.data.frame.matrix(zre)
ZRE
POP<-as.data.frame.matrix(pop)
POP

dataframe1<-merge(ZHO,ZES)
dataframe1

dataframe1<-merge(dataframe1,ZRE)
dataframe1
dataframe1<- merge(dataframe1,POP)
dataframe1

But I wonder if there is another way, because this way is laborious and does not work when I have a very large number of dates frames

5 answers

3

To illustrate, I will create 3 different example data.frames, one with variable x, other with variable y and the other with the variable z for the same individuals id:

### exemplos ####
set.seed(1)
df1 <- data.frame(id=1:10, y = rnorm(10))
df2 <- data.frame(id=1:10, z = rnorm(10))
df3 <- data.frame(id=1:10, x = rnorm(10))

With the base functions of R, a way to merge directly from the three is by combining the function Reduce with the merge:

resultado <- Reduce(function(x,y) {merge(x,y)}, list(df1, df2, df3))
resultado
   id          y           z           x
1   1 -0.6264538  1.51178117  0.91897737
2   2  0.1836433  0.38984324  0.78213630
3   3 -0.8356286 -0.62124058  0.07456498
4   4  1.5952808 -2.21469989 -1.98935170
5   5  0.3295078  1.12493092  0.61982575
6   6 -0.8204684 -0.04493361 -0.05612874
7   7  0.4874291 -0.01619026 -0.15579551
8   8  0.7383247  0.94383621 -1.47075238
9   9  0.5757814  0.82122120 -0.47815006
10 10 -0.3053884  0.59390132  0.41794156

In the package plyr, there is the function join_all, which basically does the same as what was done above, but with a simpler syntax:

library(plyr) ### se você ainda não tem o pacote, você tem que instalar antes

resultados <- join_all(list(df1,df2,df3)) 
resultados

   id          y           z           x
1   1 -0.6264538  1.51178117  0.91897737
2   2  0.1836433  0.38984324  0.78213630
3   3 -0.8356286 -0.62124058  0.07456498
4   4  1.5952808 -2.21469989 -1.98935170
5   5  0.3295078  1.12493092  0.61982575
6   6 -0.8204684 -0.04493361 -0.05612874
7   7  0.4874291 -0.01619026 -0.15579551
8   8  0.7383247  0.94383621 -1.47075238
9   9  0.5757814  0.82122120 -0.47815006
10 10 -0.3053884  0.59390132  0.41794156

Or if you just want to type in a cleaner way, you can use the package magrittr that has a forward pipe Perator for R. With this package, the merge of the 3 data.frames can also be done in a single line by chaining the arguments with the operator %>%:

 library(magrittr) ### se você ainda não tem o pacote, você tem que instalar antes

 resultado <- df1%>%merge(df2)%>%merge(df3)
 resultado

   id          y           z           x
1   1 -0.6264538  1.51178117  0.91897737
2   2  0.1836433  0.38984324  0.78213630
3   3 -0.8356286 -0.62124058  0.07456498
4   4  1.5952808 -2.21469989 -1.98935170
5   5  0.3295078  1.12493092  0.61982575
6   6 -0.8204684 -0.04493361 -0.05612874
7   7  0.4874291 -0.01619026 -0.15579551
8   8  0.7383247  0.94383621 -1.47075238
9   9  0.5757814  0.82122120 -0.47815006
10 10 -0.3053884  0.59390132  0.41794156

Remembering that it is always good for you to specify which are the identifier columns of merge (option by of the function), because otherwise you may end up with something different than expected. In the above case this is not necessary because we have only one column in common.

1

With the dplyr you can use the functions of Join chained with Piper Operator %>% (inner_join, full_join, left_join, right_join etc, depending on your goal).

Example with full_join chained:

library(dplyr)

set.seed(1)
df1 <- data.frame(id=1:10, y = rnorm(10))
df2 <- data.frame(id=1:10, z = rnorm(10))
df3 <- data.frame(id=1:10, x = rnorm(10))

df1 %>% full_join(df2) %>% full_join(df3)

  id          y           z           x
1   1 -0.6264538  1.51178117  0.91897737
2   2  0.1836433  0.38984324  0.78213630
3   3 -0.8356286 -0.62124058  0.07456498
4   4  1.5952808 -2.21469989 -1.98935170
5   5  0.3295078  1.12493092  0.61982575
6   6 -0.8204684 -0.04493361 -0.05612874
7   7  0.4874291 -0.01619026 -0.15579551
8   8  0.7383247  0.94383621 -1.47075238
9   9  0.5757814  0.82122120 -0.47815006
10 10 -0.3053884  0.59390132  0.41794156

0

## Load Spreadsheet, separator "," (Planilha sepada por vígula)
obs_1 <- read.csv("constituents-financials-observations-1.csv")

head(obs_1) ## Visualizing data frame obs_1

obs_2 <- read.csv("constituents-financials-observations-2.csv")

head(obs_2) ## Visualizing data frame obs_2

## both dataframe have the same structure - column number = 12 

newCols = 12 

## new variables for both dataframes will be price_1,price_2,...
newNames = paste("price", seq_len(newCols), sep = "_") 

## set new names of the columns (price_1, price_2,...)

## Data frame obs_1
colnames(obs_1) <- newNames 

## change the name of the last variable
colnames(obs_1)[12] <- 'Síte' 

## change the name of firt variable
colnames(obs_1)[1] <- 'Sign'  

## Data frame obs_2
colnames(obs_2) <- newNames

colnames(obs_2)[12] <- 'Site'

colnames(obs_2)[1] <- 'Sign'

## full join in both dataframe. merge = lines of obs_1 + lines of obs_2
observations <- merge(obs_1, obs_2, all = TRUE) 

0

You can use the function cbind which is from the base package of the R language:

Using the data frame created by Carlos Cinelli:

set.seed(1)
df1 <- data.frame(id=1:10, y = rnorm(10))
df2 <- data.frame(id=1:10, z = rnorm(10))
df3 <- data.frame(id=1:10, x = rnorm(10))
df4 <- cbind(df1,df2,df3)

cbind joins the date.frames by spine

rbind joins the data.frames by line

Only in this example the same columns will not be removed or merged.

0

For those who want a "new alternative".

Has the functions:

bind_rows(..., .id = NULL)

bind_cols(...)

combine(...)

They’re in the package dplyr which is well worth studying.

  • André, these functions are not meant to merge data.frames.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.