How to join two data.frames of different sizes per column in R?

Asked

Viewed 2,072 times

4

Suppose I have two different date frames.:

print(DADOS_1)

linha coluna1 coluna2
1     1       3
2     3       4
3     1       1
4     2       2

print(DADOS_2)

linha coluna3 coluna4
1     3       1
2     2       2
3     5       0
4     2       4
5     1       3
6     3       1

I need to unite DADOS_1 and DADOS_2 by columns, that is, that unite laterally. The result should be this:

print(DADOS_1+2)

    linha coluna1 coluna2 coluna3 coluna 4
    1     1       3       3       1
    2     3       4       2       2
    3     1       1       5       0
    4     2       2       2       4
    5     NA      NA      1       3
    6     NA      NA      3       1

I tried to use the function bind_cols but she came back to me with the following mistake:

Error: Argument 2 must be length 36550, not 138383

Which in this case is saying that it is not possible to merge my two data.frames because they both have different sizes.

What should I do to join my two date frames of different sizes by columns in R?

3 answers

4

Suppose a (10 lines) and b (5-line):

a <- data.frame(
  x = replicate(n = 8, expr = runif(10, 20, 100))
)


b <- data.frame(
  y = replicate(n = 4, expr = runif(5, 20, 100))
)

Agrora, I create a list:

library(tidyverse)

lista <- lst(a, b)

Working with lists allows you to do joins with several databases, not just two. This is the advantage.

With tidyverse you do so:

lista %>% 
  map(~ mutate(., rownames = row.names(.))) %>% 
  reduce(full_join, by = 'rownames')
  • Note that I needed to create a column (rownames) to make the comparison.
        x.1      x.2      x.3      x.4      x.5      x.6      x.7      x.8 rownames
1  74.69882 35.89753 28.04641 90.21342 81.84718 48.05258 77.96111 57.18316        1
2  77.68592 95.76047 80.74215 60.79755 72.05111 99.42336 42.95387 97.24211        2
3  56.54784 42.12707 85.28353 93.75327 94.64186 57.47894 77.20191 62.89326        3
4  56.06928 44.25626 73.59108 36.27553 80.06120 40.40878 39.21776 96.30845        4
5  82.46494 59.77877 47.20289 52.71778 61.25111 87.92412 39.39340 70.68103        5
6  57.25800 60.69670 21.26649 85.86384 92.79378 74.92121 64.67908 60.38243        6
7  76.81113 89.46213 38.02942 93.48745 44.17187 38.44297 53.09666 85.19333        7
8  34.82535 80.53654 87.08810 21.20205 74.30482 49.67933 51.85050 59.47621        8
9  40.15681 71.67351 20.90501 80.65097 77.12172 27.66269 25.24923 30.93586        9
10 21.09820 66.00663 23.45102 82.09685 26.14959 20.94048 45.73111 53.22275       10
        y.1      y.2      y.3      y.4
1  57.51489 91.06000 50.09318 29.64023
2  85.18483 63.15968 38.07206 64.34042
3  58.85344 21.06321 36.06338 87.25948
4  20.23187 32.84291 83.87627 23.88338
5  43.17968 69.30414 28.58430 39.36796
6        NA       NA       NA       NA
7        NA       NA       NA       NA
8        NA       NA       NA       NA
9        NA       NA       NA       NA
10       NA       NA       NA       NA
  • thank you for the answer I will test.

4


Assuming that the data frames already have a column that serves as identification, you can use the dplyr::full_join directly. Taking advantage of the simulation presented in this other answer, we have

library(dplyr)

a <- data.frame(linha = 1:10,  
  x = replicate(n = 8, expr = runif(10, 20, 100))
)

b <- data.frame(linha = 1:5,
  y = replicate(n = 4, expr = runif(5, 20, 100))
)

full_join(a, b, by = "linha")

That is, just indicate to the function dplyr::full_join what are the date frames to be merged and which column name will serve as reference for this union.

  • thank you for the answer I will test.

3

Only with R base, function merge with the argument all = TRUE does what the question asks.

set.seed(1234)

a <- data.frame(linha = 1:4,  
                x = replicate(n = 2, expr = sample(0:5, 4, TRUE))
)

b <- data.frame(linha = 1:6,
                y = replicate(n = 2, expr = sample(0:5, 6, TRUE))
)

merge(a, b, all = TRUE)
#  linha x.1 x.2 y.1 y.2
#1     1   3   3   3   3
#2     2   1   0   1   5
#3     3   5   4   5   5
#4     4   4   5   1   5
#5     5  NA  NA   5   3
#6     6  NA  NA   5   3

Browser other questions tagged

You are not signed in. Login or sign up in order to post.