Apply a function to a dataframe R

Asked

Viewed 56 times

3

I created a function to automatically check if the value of a column is contained in a list. I could do dplyr::mutate + dplyr::ifelse, but as they are for many columns, it would be a very long code. Function works out of mutate, but not in it.

What I did:

see_if_succed <- function(x,y){
  
  if (x %in% y) {
    1
  } else {
    0
  }
  
}

Outside the pipe the function works:

succeed <- c(1,5,8,9)
see_if_succed(100,succeed)
#0

But not inside the pipe:

succeed <- c(1,5,8,9)
a <- c("A","B", "C")
b <- c(1,2,1)

y <- data.frame(a,b)

library(dplyr)

y %>%
  mutate(z = see_if_succed(b, succeed))
# a b z
# 1 A 1 1
# 2 B 2 1
# 3 C 1 1
# Warning message:
#   Problem with `mutate()` input `z`.
# i the condition has length > 1 and only the first element will be used
# i Input `z` is `see_if_succed(b, succeed)`.

Could someone tell me how I make this function work on a dataframe?

2 answers

4


Here’s a very simple answer to a frequent problem.
When you want to create a condition-based binary variable, it is not necessary if, or ifelse. Like the logical values FALSE/TRUE are coded internally as the integers 0/1, just transform the value of the condition into class vector "integer".

see_if_succed <- function(x, y) as.integer(x %in% y)

y %>%
  mutate(z = see_if_succed(b, succeed))
#  a b z
#1 A 1 1
#2 B 2 0
#3 C 1 1

3

Because of his parole if, your function was created to be applied to one element at a time. So much so that this is the Warning that is given:

the condition has length > 1 and only the first element will be used

One way to solve this problem is to vector the original function. Particularly, the most practical way I consider to do this is through command Vectorize:

see_if_succed_v <- Vectorize(see_if_succed, vectorize.args = "x")

y %>%
  mutate(z = see_if_succed_v(b, succeed))
#>   a b z
#> 1 A 1 1
#> 2 B 2 0
#> 3 C 1 1

Created on 2021-05-17 by the reprex package (v2.0.0)

Note that I just created a new function called see_if_succed_v, stating that the argument x of function see_if_succed is what should be considered for vectorization. Thus, the result of the column z in the final data frame became what was expected with the application of see_if_succed_v line by line.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.