Removal of lines with non-repeating levels in R

Asked

Viewed 46 times

0

Guys, I have the following df:

df <- data.frame(X =c("a","b","c","a","b","c","a","b","c","d","a","b","c","d","e"),
 Y = c("w","w","w", "K","K","K", "L","L","L","L","Z","Z","Z","Z","Z"))

Note that the first vector has 5 levels and the second has 4 levels. My goal is to select the lines of the df that have all levels of vector 1 in common as vector 2. That is, I want to select lines that have levels "a","b" and "c", since "d" only appears twice "and" appears once in vector 1.

I tried to make a list with the levels in common and leave only the lines with the levels in common by subset. However, it doesn’t work because this level list doesn’t generate the address of the lines I want to remove. Ex:

comuns <- c("a","b","c")
df2 <- df[c(comuns),]

In my df real there are 64 levels in common, so it does not roll do "raw". Someone can help me?

  • 1

    I couldn’t understand what the phrase "select lines from df which have all levels of vector 1 in common as vector 2". In particular, I don’t see how this phrase turned into the following phrase: "select lines that have the levels 'a', 'b' and 'c'". Vector 1 is column X? Vector 2 is column Y? In this case, I cannot understand how X and Y can have common levels in this specific example. It would be interesting to edit the question and put the expected answer.

  • Yes, Marcus. Vector 1 is the X column and Vector 2 is the Y column. I’ve already solved the problem with the help of colleagues below. Thank you!

1 answer

2


> df[df$X %in% comuns, ]
   X Y
1  a w
2  b w
3  c w
4  a K
5  b K
6  c K
7  a L
8  b L
9  c L
11 a Z
12 b Z
13 c Z

Finding the common elements:

tabF <- table(df$X, df$Y)
comuns <- rownames(tabF)[apply(tabF > 0, 1, all)]

> comuns
[1] "a" "b" "c"

Browser other questions tagged

You are not signed in. Login or sign up in order to post.