Tidyr spread does not return to original data

Question

Tidyr spread does not return to original data

Asked 7 years ago

Viewed 63 times

1

Consider the data.frame:

df_1 <- data.frame(
  a = replicate(6, runif(30, 20, 100)), 
  b = rep(c(LETTERS[1:5]), times = 1, each = 6)
)

Use of gather:

library(tidyverse)
library(magrittr)

df_1 %<>% as_tibble

x <- df_1 %>% 
  select_at(vars(num_range('a.', 1:3))) %>% 
  gather(key = 'factors', value = 'case') %>% 
  print()

# A tibble: 90 x 2
   factors  case
   <chr>   <dbl>
 1 a.1      91.0
 2 a.1      56.2
 3 a.1      34.0
 4 a.1      85.1
 5 a.1      66.2
 6 a.1      21.7
 7 a.1      29.8
 8 a.1      80.3
 9 a.1      59.8
 10 a.1      85.4
# … with 80 more rows

Use of spread to return to the original data:

y <- x %>% 
  spread(key = factors, value = case)

Error: Duplicate Identifiers for Rows (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60), (61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90) Call rlang::last_error() to see a backtrace

To solve this (in part), I used mutate with row_number():

 y <- x %>% 
    mutate(n = row_number()) %>% 
    spread(key = factors, value = case) %>% 
    print()

# A tibble: 90 x 4
       n   a.1   a.2   a.3
   <int> <dbl> <dbl> <dbl>
 1     1  91.0    NA    NA
 2     2  56.2    NA    NA
 3     3  34.0    NA    NA
 4     4  85.1    NA    NA
 5     5  66.2    NA    NA
 6     6  21.7    NA    NA
 7     7  29.8    NA    NA
 8     8  80.3    NA    NA
 9     9  59.8    NA    NA
10    10  85.4    NA    NA
# … with 80 more rows

The three columns are returned, but the cases do not match (that is, next to each value, there is a missing data - NA). How do I adjust this with some function of tidyverse so as to leave mine data.frame with 30 lines and not 90?

2

It has to include a group_by(factor) before the mutate(n = row_number()).

– Tomás Barcellos

2019/03/12 at 13:41

1 answer

Browser other questions tagged r tidyverse

You are not signed in. Login or sign up in order to post.

by neves • **5,644** points · Answer 1 · 2019-03-12T15:50:45+00:00

It is necessary to activate the function group_by before mutate, as you quoted @Tomás in the comments. The function would look like this:

library(tidyverse)    

y <- x %>% 
  group_by(factors) %>% 
  mutate(n = row_number()) %>% 
  spread(key = factors, value = case) %>% 
  print()

# A tibble: 30 x 4
       n   a.1   a.2   a.3
   <int> <dbl> <dbl> <dbl>
 1     1  85.2  38.8  78.4
 2     2  84.2  82.4  21.2
 3     3  58.1  49.9  77.1
 4     4  57.0  85.9  62.4
 5     5  20.9  93.9  97.8
 6     6  59.3  91.6  59.2
 7     7  33.6  90.8  30.9
 8     8  44.0  50.8  81.4
 9     9  23.7  69.5  33.0
10    10  35.6  62.3  25.1
# ... with 20 more rows