Create new rows and columns for non-existent values

Asked

Viewed 42 times

3

I have a date frame where the "years" column is not filled in all the data I need. I would need, for each observation, the 1988 to 2014 scale, filling with 0 (zero) the years whose values do not appear in the original database.

The data frame is as follows:

structure(list(A1 = c(110001, 110001, 110001, 110001, 110001, 
110002, 110002, 110002, 110002, 110002, 110002, 110002, 110002, 
110002, 110003, 110003, 110003, 110003), anos = c(1999, 2003, 
2006, 2008, 2012, 1996, 1998, 2001, 2004, 2005, 2009, 2010, 2011, 
2013, 1991, 2006, 2007, 2010), n = c(1L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), cs2 = c(1L, 
3L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 11L, 2L, 3L, 
4L, 5L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L), groups = structure(list(A1 = c(110001, 
110002, 110003), .rows = list(1:5, 6:14, 15:18)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

Note that for the first observation only the years 1999, 2003, 2006, 2008 and 2012 appear. I would need every year from 1988 to 2014, filling with 0 in columns "n" and "cs2" cases where there is no value in the original bank. It would look something like this (I did 'no hand' to illustrate):

structure(list(A1 = c(110001, 110001, 110001, 110001, 110001, 
110001, 110001, 110001, 110001, 110001, 110001, 110001, 110001, 
110001, 110001, 110001, 110001, 110001, 110001, 110001, 110001, 
110001, 110001, 110001, 110001, 110001), anos = c(1988, 1989, 
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
2012, 2013), `n ` = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 2, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0), cs2 = c(0, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 5, 0, 6, 0, 0, 0, 7, 7
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-26L), spec = structure(list(cols = list(A1 = structure(list(), class = c("collector_double", 
"collector")), anos = structure(list(), class = c("collector_double", 
"collector")), `n ` = structure(list(), class = c("collector_double", 
"collector")), cs2 = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

I tried to use the mutate plus the combination if_else, but without success.

1 answer

4


The first thing I would do is warn R what are the values that anos can assume. In this case, these values go from 1988 to 2014. For me, the best way to do this is by converting the column anos in factor and explicitly saying what values it can assume:

dados$anos <- factor(dados$anos, levels = seq(1988, 2014))

This done, just use the function complete of dplyr to find all possible combinations between anos and A1:

dados %>%
  complete(anos, nesting(A1), fill = list(n = 0, cs2 = 0))
# A tibble: 81 x 4
# Groups:   A1 [3]
   anos      A1     n   cs2
   <fct>  <dbl> <dbl> <dbl>
 1 1988  110001     0     0
 2 1989  110001     0     0
 3 1990  110001     0     0
 4 1991  110001     0     0
 5 1992  110001     0     0
 6 1993  110001     0     0
 7 1994  110001     0     0
 8 1995  110001     0     0
 9 1996  110001     0     0
10 1997  110001     0     0
11 1998  110001     0     0
12 1999  110001     1     1
13 2000  110001     0     0
14 2001  110001     0     0
15 2002  110001     0     0
16 2003  110001     2     3
17 2004  110001     0     0
18 2005  110001     0     0
19 2006  110001     2     5
20 2007  110001     0     0
# … with 61 more rows

Note that it was enough to inform what value I would like n and cs2 assume so that the R automatically put it in its place.

To get again anos as a numeric variable, rotate

dados$anos <- as.numeric(levels(dados$anos))
  • 2

    May also be mutate(anos = factor(anos, levels = 1988:2014)) %>% before the complete(...). It’s all over again.

  • In fact, it’s more organized.

  • Thank you Marcus and Rui, it worked perfectly!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.