How to transpose from "wide" to "long" (wide to long) with several variables?

Asked

Viewed 72 times

4

I have a dataframe with multiple variables, as in the example below:

df <- read.table(header=TRUE, 
                 text="
ID  COR TIPO    SITUACAO_2016   SITUACAO_2017   SITUACAO_2018   SITUACAO_2019   SITUACAO_2020   ID_EMPRESA_2016 ID_EMPRESA_2017 ID_EMPRESA_2018 ID_EMPRESA_2019 ID_EMPRESA_2020
578 23  S   45  63  30  51  49  989 989 455 455 455
982 45  G   13  29  22  37  33  179 225 225 613 225
") 

Some of these variables have the suffix of the year because they change over time (situacao and id_empresa). Others are fixed, do not change over time to the same id (cor and tipo).

How do I transpose the variables to the long format, creating a new variable relative to the year of the situacao and of id_empresa?

Expected result:

output <-  read.table(header=TRUE, 
                     text="
ID  ANO COR TIPO    SITUACAO    ID_EMPRESA
578 2016    23  S   45  989
578 2017    23  S   63  989
578 2018    23  S   30  455
578 2019    23  S   51  455
578 2020    23  S   49  455
982 2016    45  G   13  179
982 2017    45  G   29  225
982 2018    45  G   22  225
982 2019    45  G   37  613
982 2020    45  G   33  225
") 

Preferably, I would like a solution tidyverse, because I’m learning to use it and I have some familiarity.

What I did:

library(tidyverse)
output <- df %>% 
  pivot_longer(., cols = situacao_2016:id_empresa_2020)

What I got:

dput(output)

structure(list(ID = c(578L, 578L, 578L, 578L, 578L, 578L, 578L, 
578L, 578L, 578L, 982L, 982L, 982L, 982L, 982L, 982L, 982L, 982L, 
982L, 982L), COR = c(23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 
23L, 23L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L), 
    TIPO = c("S", "S", "S", "S", "S", "S", "S", "S", "S", "S", 
    "G", "G", "G", "G", "G", "G", "G", "G", "G", "G"), name = c("SITUACAO_2016", 
    "SITUACAO_2017", "SITUACAO_2018", "SITUACAO_2019", "SITUACAO_2020", 
    "ID_EMPRESA_2016", "ID_EMPRESA_2017", "ID_EMPRESA_2018", 
    "ID_EMPRESA_2019", "ID_EMPRESA_2020", "SITUACAO_2016", "SITUACAO_2017", 
    "SITUACAO_2018", "SITUACAO_2019", "SITUACAO_2020", "ID_EMPRESA_2016", 
    "ID_EMPRESA_2017", "ID_EMPRESA_2018", "ID_EMPRESA_2019", 
    "ID_EMPRESA_2020"), value = c(45L, 63L, 30L, 51L, 49L, 989L, 
    989L, 455L, 455L, 455L, 13L, 29L, 22L, 37L, 33L, 179L, 225L, 
    225L, 613L, 225L)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

I was able to transpose, but the column names did not turn out as I wanted. A year variable was not created. In addition I stayed two lines per subject and per year, one for situation and another for id_empresa. I would like to get the result as mentioned above.

1 answer

5


This pivot is a specific case where you have a two variables in the column name. The function pivot_longer can work with it but it was a relatively recent implementation. The last example here https://tidyr.tidyverse.org/reference/pivot_longer.html covers a similar case.

Basically before pivoting it separates the variables according to the standard provided in names_pattern. And in names_to i signal that the first part of the pattern are the names of the columns using the .value and that the second part goes to a YEAR column.

library(tidyverse)

df <- read.table(header=TRUE, 
                 text="
ID  COR TIPO    SITUACAO_2016   SITUACAO_2017   SITUACAO_2018   SITUACAO_2019   SITUACAO_2020   ID_EMPRESA_2016 ID_EMPRESA_2017 ID_EMPRESA_2018 ID_EMPRESA_2019 ID_EMPRESA_2020
578 23  S   45  63  30  51  49  989 989 455 455 455
982 45  G   13  29  22  37  33  179 225 225 613 225
") 


df %>% 
  pivot_longer(cols = starts_with(c("SITUACAO", "ID_EMPRESA")),
               names_to = c(".value", "ANO"),
               names_pattern = "(.*)_(\\d*)") %>% 
  select(ID, ANO, everything())
#> # A tibble: 10 x 6
#>       ID ANO     COR TIPO  SITUACAO ID_EMPRESA
#>    <int> <chr> <int> <chr>    <int>      <int>
#>  1   578 2016     23 S           45        989
#>  2   578 2017     23 S           63        989
#>  3   578 2018     23 S           30        455
#>  4   578 2019     23 S           51        455
#>  5   578 2020     23 S           49        455
#>  6   982 2016     45 G           13        179
#>  7   982 2017     45 G           29        225
#>  8   982 2018     45 G           22        225
#>  9   982 2019     45 G           37        613
#> 10   982 2020     45 G           33        225

Created on 2020-12-07 by the reprex package (v0.3.0)

  • Thank you @Jorge! Second reply!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.