In this case I always use the function bind_rows
of dplyr
:
library(dplyr)
dados <- bind_rows(df1,df2)
> dados
Source: local data frame [20 x 3]
id z x
(int) (dbl) (dbl)
1 1 0.8179472 NA
2 2 0.2624969 NA
3 3 -0.1684590 NA
4 4 -0.1239140 NA
5 5 0.4434778 NA
6 6 -0.8865578 NA
7 7 0.1160360 NA
8 8 0.5604733 NA
9 9 -2.2761215 NA
10 10 -0.7920775 NA
11 11 1.7650167 -1.38172797
12 12 -1.0004357 2.64345620
13 13 -1.6467084 -0.01361806
14 14 0.9055755 2.00354819
15 15 -0.1645952 0.57657614
16 16 0.2675339 -0.01727064
17 17 0.6383209 -0.43920834
18 18 -1.4729775 -0.35907320
19 19 0.9345417 -0.93673279
20 20 -0.7888048 0.36903134
I found it cool to put here a comparison of running time of all alternatives:
> microbenchmark(
+ base = merge(df1, df2, all = TRUE),
+ dplyr = dplyr::bind_rows(df1,df2),
+ data.table = data.table::rbindlist(list(df1,df2), fill = TRUE),
+ plyr = plyr::rbind.fill(df1,df2)
+ )
Unit: microseconds
expr min lq mean median uq max neval
base 1370.788 1578.6680 2138.9646 1852.2805 2296.0775 8607.060 100
dplyr 64.768 111.1450 205.0742 126.2580 161.3900 4055.948 100
data.table 173.051 239.8905 2860.8464 280.5705 352.7535 253411.277 100
plyr 362.365 440.6795 597.4301 506.5200 622.8745 4323.416 100
Note that the solution using dplyr
is the fastest of all. More than 10x more than the base
and about 2x faster than the solution by data.table
. I’m comparing the medians!