Tidyr spread does not return to original data

Asked

Viewed 63 times

1

Consider the data.frame:

df_1 <- data.frame(
  a = replicate(6, runif(30, 20, 100)), 
  b = rep(c(LETTERS[1:5]), times = 1, each = 6)
)

Use of gather:

library(tidyverse)
library(magrittr)

df_1 %<>% as_tibble

x <- df_1 %>% 
  select_at(vars(num_range('a.', 1:3))) %>% 
  gather(key = 'factors', value = 'case') %>% 
  print()

# A tibble: 90 x 2
   factors  case
   <chr>   <dbl>
 1 a.1      91.0
 2 a.1      56.2
 3 a.1      34.0
 4 a.1      85.1
 5 a.1      66.2
 6 a.1      21.7
 7 a.1      29.8
 8 a.1      80.3
 9 a.1      59.8
 10 a.1      85.4
# … with 80 more rows

Use of spread to return to the original data:

y <- x %>% 
  spread(key = factors, value = case)

Error: Duplicate Identifiers for Rows (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60), (61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90) Call rlang::last_error() to see a backtrace

To solve this (in part), I used mutate with row_number():

 y <- x %>% 
    mutate(n = row_number()) %>% 
    spread(key = factors, value = case) %>% 
    print()

# A tibble: 90 x 4
       n   a.1   a.2   a.3
   <int> <dbl> <dbl> <dbl>
 1     1  91.0    NA    NA
 2     2  56.2    NA    NA
 3     3  34.0    NA    NA
 4     4  85.1    NA    NA
 5     5  66.2    NA    NA
 6     6  21.7    NA    NA
 7     7  29.8    NA    NA
 8     8  80.3    NA    NA
 9     9  59.8    NA    NA
10    10  85.4    NA    NA
# … with 80 more rows

The three columns are returned, but the cases do not match (that is, next to each value, there is a missing data - NA). How do I adjust this with some function of tidyverse so as to leave mine data.frame with 30 lines and not 90?

  • 2

    It has to include a group_by(factor) before the mutate(n = row_number()).

1 answer

2


It is necessary to activate the function group_by before mutate, as you quoted @Tomás in the comments. The function would look like this:

library(tidyverse)    

y <- x %>% 
  group_by(factors) %>% 
  mutate(n = row_number()) %>% 
  spread(key = factors, value = case) %>% 
  print()

# A tibble: 30 x 4
       n   a.1   a.2   a.3
   <int> <dbl> <dbl> <dbl>
 1     1  85.2  38.8  78.4
 2     2  84.2  82.4  21.2
 3     3  58.1  49.9  77.1
 4     4  57.0  85.9  62.4
 5     5  20.9  93.9  97.8
 6     6  59.3  91.6  59.2
 7     7  33.6  90.8  30.9
 8     8  44.0  50.8  81.4
 9     9  23.7  69.5  33.0
10    10  35.6  62.3  25.1
# ... with 20 more rows

Browser other questions tagged

You are not signed in. Login or sign up in order to post.