Fill column of a data frame with data from another data frame in R

Asked

Viewed 1,376 times

6

I have the following df (data1):

ITEM    CLASSIFICACAO
123     AZUL
456     AMARELO
789 
234     VERDE
345     PRETO
456 
567 
678     ROSA

I need to fill in the blank lines of the column CLASSIFICACAO using another data frame (data2):

ITEM    CLASSIFICACAO
789     LARANJA
456     MARROM
567     BRANCO
100     CASA
200     BOLA

How do I fill out the df’s blank lines (data_1)? Grateful

3 answers

5


I would do it using the package dplyr. With the dplyr you will combine simple operations until you achieve the result you want:

First, the databases:

dados1 <- data.frame(
  ITEM = c(123,456,789,234,345,456,567,678),
  CLASSIFICACAO = c("AZUL", "AMARELO", NA, "VERDE", "PRETO", NA, NA, "ROSA"), stringsAsFactors = FALSE)
dados2 <- data.frame(
  ITEM = c(789, 456, 567, 100, 200),
  CLASSIFICACAO = c("LARANJA", "MARROM", "BRANCO", "CASA", "BOLA"), stringsAsFactors = F)

Now come on:

dados1 %>% 
  filter(is.na(CLASSIFICACAO)) %>% # pegamos só as linhas vazias
  select(-CLASSIFICACAO) %>% # tiramos a variável classificação
  left_join(dados2, by = "ITEM") %>% # fazemos o join com o outro bd
  bind_rows(dados1 %>% filter(!is.na(CLASSIFICACAO))) # empilhamos os dois

4

Ideally you would have made your database available (through the function dput), or a part of it at least.

With this example you passed, if these blank lines are NA, you can use the function FillIn package DataCombine.

dados1 <- data.frame(
  ITEM = c(123,456,789,234,345,456,567,678),
  CLASSIFICACAO = c("AZUL", "AMARELO", NA, "VERDE", "PRETO", NA, NA, "ROSA"))
dados2 <- data.frame(
  ITEM = c(789,456,567,100,200),
  CLASSIFICACAO = c("LARANJA", "MARROM", "BRANCO", "CASA", "BOLA"))

DataCombine::FillIn(dados1, dados2, Var1 = "CLASSIFICACAO", Var2 = "CLASSIFICACAO",
                    KeyVar = "ITEM")

  ITEM CLASSIFICACAO
1  123          AZUL
2  234         VERDE
3  345         PRETO
4  456       AMARELO
5  456        MARROM
6  567        BRANCO
7  678          ROSA
8  789       LARANJA

3

Here’s another example with a code that doesn’t use dependencies (only based on R).

In its example, 456 is the same code used for YELLOW and BROWN. I created another code (457 for BROWN) to avoid duplicates (but I don’t know if that was your intention).

First I define the rule to complete the missing data with a "match" of df1 Nas that might be present in df2. And then I apply the rule about the data.frame df1

df1 <- data.frame(ITEM = c(123,456,789,234,345,457,567,678),
             CLASS = c("AZUL","AMARELO",NA,"VERDE","PRETO",NA,NA,"ROSA"), 
             stringsAsFactors = FALSE)
df2 <- data.frame(ITEM = c(457, 567,100,200, 789),
              CLASS = c("MARROM","BRANCO","CASA","BOLA","LARANJA"), 
              stringsAsFactors = FALSE)

r <- match(df1[is.na(df1$CLASS), "ITEM"], df2$ITEM)

df1[is.na(df1$CLASS), "CLASS" ] <- df2[r, "CLASS"]

print(df1)

  ITEM   CLASS
1  123    AZUL
2  456 AMARELO
3  789 LARANJA
4  234   VERDE
5  345   PRETO
6  457  MARROM
7  567  BRANCO
8  678    ROSA

Browser other questions tagged

You are not signed in. Login or sign up in order to post.