What does it mean that a table is in wide format? And long?
- A database in format wide is the one in which the variables are unpaired (one separate from the other).
- A database in format long is the one in which the variables are stacked (one on top of the other).
The following figure exemplifies this:
Note that in the format wide, x
and y
are the variable names. When you convert the database to the format long, the values of these variables will be stacked with the respective name of the variable next to it (i.e., in another column), indicating that these values correspond to it.
Which packages/functions in R can be used to turn a wide table into long and vice versa?
In the r utils
you can use the function stack
(stack) to convert a data.frame
of wide for long:
dataset<-data.frame(matrix(runif(6*5,1,10),ncol=6)) # criação do data.frame
dataset
X1 X2 X3 X4 X5 X6
1 7.349284 6.028351 2.688078 6.125223 1.221548 9.612955
2 4.069976 2.874686 6.672611 7.392773 8.788791 1.947049
3 5.601091 5.088117 6.642646 1.919682 5.083521 2.890271
4 8.972191 5.440744 1.900963 2.321034 4.617486 3.135706
5 6.863326 3.664501 8.406267 2.357013 7.787931 5.592315
empilhar<-stack(dataset,select=(1:6)) # converte para o formato long
empilhar
values ind
1 7.349284 X1
2 4.069976 X1
3 5.601091 X1
4 8.972191 X1
5 6.863326 X1
6 6.028351 X2
7 2.874686 X2
8 5.088117 X2
9 5.440744 X2
10 3.664501 X2
11 2.688078 X3
12 6.672611 X3
13 6.642646 X3
14 1.900963 X3
15 8.406267 X3
16 6.125223 X4
17 7.392773 X4
18 1.919682 X4
19 2.321034 X4
20 2.357013 X4
21 1.221548 X5
22 8.788791 X5
23 5.083521 X5
24 4.617486 X5
25 7.787931 X5
26 9.612955 X6
27 1.947049 X6
28 2.890271 X6
29 3.135706 X6
30 5.592315 X6
The function unstack
performs the reverse process. That is, converts from long for wide:
desempilhar<-unstack(empilhar) # converte para o formato wide
X1 X2 X3 X4 X5 X6
1 7.349284 6.028351 2.688078 6.125223 1.221548 9.612955
2 4.069976 2.874686 6.672611 7.392773 8.788791 1.947049
3 5.601091 5.088117 6.642646 1.919682 5.083521 2.890271
4 8.972191 5.440744 1.900963 2.321034 4.617486 3.135706
5 6.863326 3.664501 8.406267 2.357013 7.787931 5.592315
In the tidyverse
, the package tidyr
It is also useful to do what you want.
To stack, you can use the function gather
. Considering the same data set:
library(tidyr)
res1<-gather(dataset, key='factor', value = 'my', 1:6)
head(res1,10)
factor my
1 X1 5.938725
2 X1 4.367486
3 X1 3.220609
4 X1 3.357561
5 X1 9.275956
6 X2 2.260197
7 X2 5.880264
8 X2 2.891555
9 X2 7.641574
10 X2 9.611466
- where:
key
is the name of the vector that will bear the names of the stacked variables;
value
represents the variables to be stacked.
The reverse process of the function gather
is given by the function spread
(similar to unstack
). More details on documentation.
There are other Packages who do the same (as reshape2
).
Main data utilities in format long:
Automatically creates groups for variables (useful in some analyses, such as ANOVA). Moreover, in some functions of tidyverse
, as ggplot::facet_wrap
, the data must be in this format.
Prevents the execution of loops, when using the group variable as if they were columns of a data.frame
.