Function`recode` (dplyr) does not accept numerical ranges

Asked

Viewed 334 times

2

Consider the vector:

x<-runif(30,20,100)

I would like to categorize this vector with the function recode of package dplyr. Ranges can be arbitrary. For example:

  • from 20 to 50 = 1
  • from 51 to 75 = 2
  • from 76 to 100 = 3

I know I can use other packages and functions to perform this action. But my intention is specifically to do this with the function recode of dplyr. I have tried to do this in many ways, but so far I have not succeeded.

1 answer

4


The easiest way to do this is on R basis with the function findInterval. The function dplyr appropriate will be case_when, nay recode. Here are the two ways.

library(dplyr)

set.seed(1234)
x <- runif(30, 20, 100)

y1 <- case_when(
    20 <= x & x <= 50 ~ 1L,
    50 < x & x <= 75 ~ 2L,
    75 < x ~ 3L,
    TRUE ~ NA_integer_
)

y2 <- findInterval(x, c(20, 50, 75))

identical(y1, y2)
#[1] TRUE

Editing.

After the commenting of Marcus Nunes reminded me of the function R base cut, which can be used in conjunction with Pipes, %>%. As can be seen by the result, the output is a class object "factor".

x %>% cut(breaks = c(20, 50, 75, 100), labels = 1:3)
# [1] 1 2 2 2 3 2 1 1 2 2 3 2 1 3 1 3 1 1 1 1 1 1 1 1 1 3 2 3 3
#[30] 1
#Levels: 1 2 3
  • 1

    Just to complement this answer, the following information is in the help of version 0.7.8 of dplyr (griffins of mine): All replacements **must be the same type**, and must have either length one or the same length as x. Therefore, you can’t change something from Numeric to factor using recode.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.