How to consist of a data frame against a valid value array?

Question

How to consist of a data frame against a valid value array?

Asked 5 years ago

Viewed 26 times

-3

I have an array of all possible values that appear in a set of columns of a data frame. The number of components in the vector is different from the number of observations of the data frame.

My goal is to identify the invalid observations in the data frame. The solution below worked, but I am not satisfied as I would like to not use the for loop.

DF is a data frame with 4 columns named da1 to da4 and 9 lines of observations. DF was imported from an excel spreadsheet as well as "check" which is an excel table with a single column.

str(DF)
DF1 <- DF
DF1 <- as.data.frame(DF) # transformo DF em data frame
str(DF1)

result is a logic vector of length equal to the number of lines of observations of DF1.

result1 is a logical data frame obtained from DF.

result <- logical(length=9) 
result1 <- as.data.frame(result)

The loop below checks whether each column of DF1 whether or not it has some element of "check". result1 is the resulting logical data frame.

for (col in 1:4) {
  diag <- DF1[, col] 
  result1[,col] <- is.element(diag,check) 
  }
result1

result1 <- sapply(DF1, function(x) x %in% check).

– Rui Barradas

2020/08/03 at 14:12

1 answer

Browser other questions tagged r for

You are not signed in. Login or sign up in order to post.

by Rui Barradas • **15,422** points · Answer 1 · 2020-08-03T14:20:26+00:00

No explicit loop, loop is required *apply is simpler.

result1 <- sapply(DF1, function(x) x %in% check)
result1 <- as.data.frame(result1)

result1
#     V1    V2    V3    V4
#1 FALSE FALSE FALSE  TRUE
#2  TRUE FALSE FALSE  TRUE
#3 FALSE  TRUE  TRUE  TRUE
#4 FALSE  TRUE FALSE  TRUE
#5 FALSE  TRUE FALSE FALSE
#6 FALSE  TRUE  TRUE  TRUE
#7  TRUE  TRUE FALSE FALSE
#8  TRUE  TRUE  TRUE  TRUE
#9 FALSE FALSE FALSE  TRUE

In view of the fact that the function is used in the question is.element, see that gives identical results.

identical(
  sapply(DF1, is.element, check), 
  sapply(DF1, function(x) x %in% check)
)
#[1] TRUE

Dice

set.seed(2020)
n <- 9
DF1 <- replicate(4, sample(10, n, TRUE))
DF1 <- as.data.frame(DF1)
check <- sample(10, 6)