5
Suppose I have the following data set:
dados <- structure(list(Ano = c(2001L, 2001L, 2001L, 2002L, 2002L, 2002L,
2003L, 2003L, 2003L, 2004L, 2004L, 2004L, 2005L, 2005L, 2005L,
2005L), Categoria = structure(c(1L, 2L, 4L, 1L, 3L, 4L, 1L, 2L,
3L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"
), class = "factor"), Valor = c(15, 21, 15, 14, 18, 20, 17, 21,
20, 23, 20, 22, 25, 18, 21, 16)), .Names = c("Ano", "Categoria",
"Valor"), row.names = c(NA, -16L), class = "data.frame")
dados
Ano Categoria Valor
1 2001 a 15
2 2001 b 21
3 2001 d 15
4 2002 a 14
5 2002 c 18
6 2002 d 20
7 2003 a 17
8 2003 b 21
9 2003 c 20
10 2004 b 23
11 2004 c 20
12 2004 d 22
13 2005 a 25
14 2005 b 18
15 2005 c 21
16 2005 d 16
This data set has 3 columns: Year, Category and Value. By definition, the years range from 2001 to 2005 and the only possible categories are a, b, c, d. However, some categories were not registered in some years. For example, category c did not occur in 2001. In turn, category b did not occur in 2002, and so on.
I would like to create a new data frame with 5*4 = 20 lines, as this is the product between the number of years and distinct categories. That is, the new data frame should have all possible permutations between each Year and Category value. The value 0 must be assigned to each permutation that does not exist in the original frame date. That is, I want the following result:
Ano Categoria Valor
1 2001 a 15
2 2001 b 21
3 2001 c 0
4 2001 d 15
5 2002 a 14
6 2002 b 0
7 2002 c 18
8 2002 d 20
9 2003 a 17
10 2003 b 21
11 2003 c 20
12 2003 d 0
13 2004 a 0
14 2004 b 23
15 2004 c 20
16 2004 d 22
17 2005 a 25
18 2005 b 18
19 2005 c 21
20 2005 d 16
How to proceed in an automated manner to do this?