Replace NA, based on other data frame by equal columns. R

Asked

Viewed 43 times

0

I have two tables of data. I would like to replace the Nas in Table1 with the values in Table2 using two reference columns that are in the two tables. I would like to replace only os Nas. Example:

Table1

State Shop shopping
SP E01 5
SP E02 IN
RJ E01 IN
MT E03 IN

Table2

State Shop shopping
SP E01 3
SP E02 10
RJ E01 9
RJ E02 7
MT E03 4
Tabela1 <- structure(list(
  Estado = structure(c(3L, 3L, 2L, 1L), .Label = c("MT", "RJ", "SP"), class = "factor"),
  Loja = structure(c(1L, 2L, 1L, 3L), .Label = c("E01", "E02", "E03"), class = "factor"),
  compras = c(5L, NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L))

Tabela2 <- structure(list(
  Estado = structure(c(3L, 3L, 2L, 2L, 1L), .Label = c("MT", "RJ", "SP"), class = "factor"),
  Loja = structure(c(1L, 2L, 1L, 2L, 3L), .Label = c("E01", "E02", "E03"), class = "factor"),
  compras = c(3L, 10L, 9L, 7L, 4L)), class = "data.frame", row.names = c(NA, -5L))
  • Welcome to Sopt. Always provide the example in a format that is easily read by anyone who wants to help you. See this topic on how to generate reproducible examples in R: https://answall.com/questions/264168/quais-as-principais-fun%C3%A7%C3%b5es-to-create-an-example-m%C3%adnimo-reproduce%C3%advel-em-r

1 answer

1


Can use merge to create a data.frame with Table 1 data aligned with Table 2:

TabelaM <- merge(Tabela1, Tabela2,
             by = c("Estado", "Loja"),
             all.x = TRUE,
             suffix = c("", ".y"))

TabelaM
#>   Estado Loja compras compras.y
#> 1     MT  E03      NA         4
#> 2     RJ  E01      NA         9
#> 3     SP  E01       5         3
#> 4     SP  E02      NA        10

And then use logical indexing to identify lines that have NA for compras and shall be replaced by compras.y (from the Table 2):

nas <- is.na(TabelaM$compras)

TabelaM$compras[nas] <- TabelaM$compras.y[nas]

within(TabelaM, rm(compras.y))
#>   Estado Loja compras
#> 1     MT  E03       4
#> 2     RJ  E01       9
#> 3     SP  E01       5
#> 4     SP  E02      10

Dplyr

The package dplyr has the function coalesce to fill Nas from one column with values from another. As in the above option, first join Table 2 data in Table 1 to ensure matching:

library(dplyr)

Tabela1 %>%
  left_join(Tabela2, by = c("Estado", "Loja"), suffix = c("", ".y")) %>%
  mutate(compras = coalesce(compras, compras.y)) %>%
  select(-compras.y)
#>   Estado Loja compras
#> 1     SP  E01       5
#> 2     SP  E02      10
#> 3     RJ  E01       9
#> 4     MT  E03       4

Date.table

The package data table. has the function fcoalesce ("fast coalesce") for the same purpose. As data.table works by reference, Table 1 will be modified directly:

library(data.table)

setDT(Tabela1)
setDT(Tabela2)

Tabela1[Tabela2, on = .(Estado, Loja), compras := fcoalesce(compras, i.compras)]

Tabela1
#>    Estado Loja compras
#> 1:     SP  E01       5
#> 2:     SP  E02      10
#> 3:     RJ  E01       9
#> 4:     MT  E03       4

Browser other questions tagged

You are not signed in. Login or sign up in order to post.