4
I have a dataframe with multiple variables, as in the example below:
df <- read.table(header=TRUE,
text="
ID COR TIPO SITUACAO_2016 SITUACAO_2017 SITUACAO_2018 SITUACAO_2019 SITUACAO_2020 ID_EMPRESA_2016 ID_EMPRESA_2017 ID_EMPRESA_2018 ID_EMPRESA_2019 ID_EMPRESA_2020
578 23 S 45 63 30 51 49 989 989 455 455 455
982 45 G 13 29 22 37 33 179 225 225 613 225
")
Some of these variables have the suffix of the year because they change over time (situacao
and id_empresa
). Others are fixed, do not change over time to the same id
(cor
and tipo
).
How do I transpose the variables to the long format, creating a new variable relative to the year of the situacao
and of id_empresa
?
Expected result:
output <- read.table(header=TRUE,
text="
ID ANO COR TIPO SITUACAO ID_EMPRESA
578 2016 23 S 45 989
578 2017 23 S 63 989
578 2018 23 S 30 455
578 2019 23 S 51 455
578 2020 23 S 49 455
982 2016 45 G 13 179
982 2017 45 G 29 225
982 2018 45 G 22 225
982 2019 45 G 37 613
982 2020 45 G 33 225
")
Preferably, I would like a solution tidyverse
, because I’m learning to use it and I have some familiarity.
What I did:
library(tidyverse)
output <- df %>%
pivot_longer(., cols = situacao_2016:id_empresa_2020)
What I got:
dput(output)
structure(list(ID = c(578L, 578L, 578L, 578L, 578L, 578L, 578L,
578L, 578L, 578L, 982L, 982L, 982L, 982L, 982L, 982L, 982L, 982L,
982L, 982L), COR = c(23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L,
23L, 23L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L),
TIPO = c("S", "S", "S", "S", "S", "S", "S", "S", "S", "S",
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G"), name = c("SITUACAO_2016",
"SITUACAO_2017", "SITUACAO_2018", "SITUACAO_2019", "SITUACAO_2020",
"ID_EMPRESA_2016", "ID_EMPRESA_2017", "ID_EMPRESA_2018",
"ID_EMPRESA_2019", "ID_EMPRESA_2020", "SITUACAO_2016", "SITUACAO_2017",
"SITUACAO_2018", "SITUACAO_2019", "SITUACAO_2020", "ID_EMPRESA_2016",
"ID_EMPRESA_2017", "ID_EMPRESA_2018", "ID_EMPRESA_2019",
"ID_EMPRESA_2020"), value = c(45L, 63L, 30L, 51L, 49L, 989L,
989L, 455L, 455L, 455L, 13L, 29L, 22L, 37L, 33L, 179L, 225L,
225L, 613L, 225L)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
I was able to transpose, but the column names did not turn out as I wanted. A year variable was not created. In addition I stayed two lines per subject and per year, one for situation and another for id_empresa
. I would like to get the result as mentioned above.
Thank you @Jorge! Second reply!
– Nayla