Compare lines between two tables of data?

Asked

Viewed 173 times

3

Considering:

q1 <- data.frame(COD = 1:5, CLAS=letters[1:5])
q2 <- data.frame(COD = c(25,1,31,3,2), CLAS=c(45,letters[1],100,letters[3],letters[10]))

I need to know which lines are common among the data tables, taking into account that the whole line should be the same. Is there a function that returns the index from Q1 to Q2, or vice versa?

In the case of q1 for q2, lines 1 and 3 are in 2 and 4. Or lines 2 and 4 of q2 are equivalent to 1 and 3 of q1.

How can I make this comparison?

2 answers

4

This code below works well. I imagine it’s fast even if your data frames are a little bit large.

q1 <- data.frame(COD = 1:5, CLAS=letters[1:5])
q2 <- data.frame(COD = c(25,1,31,3,2), CLAS=c(45,letters[1],100,letters[3],letters[10]))

q  <- rbind(q1, q2)

duplicados <- duplicated(q)

which(duplicados==TRUE)-dim(q1)[1]

2


I don’t know if you need the indexes. If a data.frame with the common lines is enough, an alternative solution is to use the inner_join:

> q1 <- data.frame(COD = 1:5, CLAS=letters[1:5], stringsAsFactors = F)
> q2 <- data.frame(COD = c(25,1,31,3,2), CLAS=c(45,letters[1],100,letters[3],letters[10]), stringsAsFactors = F)
> library(dplyr)
> inner_join(q1, q2)
Joining by: c("COD", "CLAS")
  COD CLAS
1   1    a
2   3    c

By default the inner_join uses all columns that have the same name in both data.frames, If you didn’t want to use them all, you could use the argument by. Read more on: help("join").


Of course you can adapt this code to store the duplicate indexes of each of the data sets, but then the code no longer looks so elegant.

> inner_join(
+   q1 %>% mutate(id_q1 = 1:nrow(.)), 
+   q2 %>% mutate(id_q2 = 1:nrow(.))
+   )
Joining by: c("COD", "CLAS")
  COD CLAS id_q1 id_q2
1   1    a     1     2
2   3    c     3     4
  • Very good. ... Thanks

Browser other questions tagged

You are not signed in. Login or sign up in order to post.