include missing data

Asked

Viewed 282 times

5

Good afternoon,

i have the following problem. I need to put together two datasets that have 2 variables in common, "uor" and "mes". Only 1 variable has no values for a given month. And what I wanted was that for the months that have no information, this particular variable, assumed the value 0 and included those missing months. Below I send a code to exemplify. The idea is to include the value 0 in variable x for months 4, 5 and 6 of uor "B".

example of code

library(data.table)
uor <- c(rep("A", 8), rep("B", 5), rep("C", 8), rep("D", 8))
uor2 <- c(rep("A", 8), rep("B", 8), rep("C", 8), rep("D", 8))
mes1 <- c(1:8, 1,2,3,7,8, 1:8, 1:8)
mes2 <- rep(1:8, 4)

DT <- data.table(uor, mes1, x = rnorm(length(uor)))
DT2 <- data.table(uor2, mes2, x1 = rnorm(length(uor2)), x2 = rpois(length(uor2), 4))

1 answer

2


It is possible to settle with dplyr:

If you have a data frame completo with all possible combinations of uor and mes, it is possible to make a Join with your data frame incompleto:

completo <- data.frame(uor=uor2, mes=mes2)
incompleto <- data.frame(uor=uor, mes=mes1, x= rnorm(length(mes1)))

The function left_joindplyr returns all rows of the first data frame and a combination of columns of the two data frames:

> set.seed(3)
> resultado <- left_join(completo, incompleto, by=c("uor","mes"))
> resultado
   uor mes           x
1    A   1  0.70290393
2    A   2  0.25700951
3    A   3  2.69017371
4    A   4  0.23524010
5    A   5 -2.75759062
6    A   6  1.03083168
7    A   7 -1.04026148
8    A   8 -0.72519990
9    B   1 -0.50549942
10   B   2  1.68207108
11   B   3  0.75289951
12   B   4          NA
13   B   5          NA
14   B   6          NA
15   B   7 -0.24574331
16   B   8  0.83446856
17   C   1 -1.01048869
18   C   2 -0.27242506
19   C   3  0.49329590
20   C   4  0.99004920
21   C   5  0.81256775
22   C   6 -0.75664209
23   C   7  0.41740965
24   C   8 -1.40809870
25   D   1 -0.81222459
26   D   2  0.73522139
27   D   3  0.43398983
28   D   4 -0.51572523
29   D   5  1.21112742
30   D   6  0.88230864
31   D   7 -1.71555588
32   D   8 -0.08829608 

After that, just set all the NA to 0:

resultado[is.na(resultado$x), ]$x <- 0 

Or you can use mutate:

resultado <- left_join(completo, incompleto, by=c("uor","mes")) %>%
                 mutate(x=ifelse(is.na(x), 0, x))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.