How to use the `dplyr::rowwise` function with more than one variable?

Asked

Viewed 87 times

1

Consider the data set below:

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

I did the following analysis:

library(tidyverse)

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(c(x.1, x.3)))

And it works. But I want to analyze all the variables in one go. I tried this in two ways:

1) Conventional method using the . as a substitute for variables:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(.))

2) Method select_if, to select only numerical variables:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(select_if(., is.numeric)))

The two methods above return this:

Source: local data frame [30 x 5]
Groups: <by row>

# A tibble: 30 x 5
     x.1   x.2   x.3   x.4   var
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  32.7  42.7  50.1  20.8 7091.
 2  75.9  71.3  83.6  77.6 7091.
 3  49.6  28.7  97.0  59.7 7091.
 4  47.4  96.1  31.9  79.7 7091.
 5  54.2  47.1  81.7  41.6 7091.
 6  27.9  58.1  97.4  25.9 7091.
 7  61.8  78.3  52.6  67.7 7091.
 8  85.4  51.3  38.8  82.0 7091.
 9  27.9  72.6  68.9  25.2 7091.
10  87.2  42.1  27.6  73.9 7091.
# ... with 20 more rows

Where 7091. is an incorrect sum.

How do I adjust both functions? I also tried one mutate_at in the first method, but without success.

1 answer

1


In method 1 I understood that it is returning the total sum value of all objects. In method 2 there is something analogous.

set.seed(1)
df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE))

library(dplyr)
df_x <- df_1 %>% 
  dplyr::select(-y) %>% 
  dplyr::mutate(Var = rowSums(.))

> df_x
        x.1      x.2      x.3      x.4      Var
1  41.24069 58.56641 93.03007 39.17035 232.0075
2  49.76991 67.96527 43.48827 24.71475 185.9382
3  65.82827 59.48330 56.72526 71.38306 253.4199
4  92.65662 34.89741 46.59157 90.10154 264.2471
5  36.13455 86.18987 72.06964 82.31317 276.7072
. . .

  • Using the for:
library(tidyverse)
df_1 <- df_1 %>% 
  dplyr::select(-y)
df <- matrix(nrow = nrow(df_1), ncol = 1)
for (i in 1:nrow(df_1)) {
  valor <- df_1 %>% 
    dplyr::slice(i) %>% 
    dplyr::rowwise() %>% 
    dplyr::mutate(var = sum(.))
  df[i,1] <- valor$var

}
df <- as.data.frame(df)
df_1 <- df_1 %>% 
  dplyr::mutate(var = df[,1])
df_1 
> df_1
        x.1      x.2      x.3      x.4      var
1  41.24069 58.56641 93.03007 39.17035 232.0075
2  49.76991 67.96527 43.48827 24.71475 185.9382
3  65.82827 59.48330 56.72526 71.38306 253.4199
4  92.65662 34.89741 46.59157 90.10154 264.2471

. . .
  • Hello, @bbiasi. You cannot provide an example using rowwise or any other function of tidyverse that does the line analysis? For example, using the purrr::lift_vd?

  • I updated the answer, added an example with for. I will study this function of purrr to see the possibility, because I don’t know.

  • https://stackoverflow.com/questions/55922514/apply-dplyrrowwise-in-all-variables ?

  • 1

    Although I asked this question, I always like to explore new solutions with the same (or different) functions for the same problem. The example with for is interesting. Thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.