Aggregate string in R

Asked

Viewed 152 times

2

I have the following basis:

cidade      a                 b         c
AGRONOMICA  CRESO             NA        NA
AGRONOMICA  NA                SICOOB    NA
ALFREDO     CREDIVERTENTES    NA        NA
ALMIRANTE   SICOPER           NA        NA
ALMIRANTE   NA                SICRED    NA
ALTO        SICOPER           NA        NA
ALTO        NA                SICOOB    NA
ALTO        NA                NA        SICRED

The idea is to aggregate the basis so that it stays that way:

cidade      a                 b         c
AGRONOMICA  CRESO             SICOOB    NA
ALFREDO     CREDIVERTENTES    NA        NA
ALMIRANTE   SICOPER           SICRED    NA    
ALTO        SICOPER           SICOOB    SICRED

Aggregate requires the values to be numerical. How to do this with these nominal variables?

  • 1

    Why the ALMIRANTE does not appear in the desired result?

  • It was supposed to show up. It was a mistake to put the answer

  • Why is it that in the result the line ALFREDO has the spine b equal to SICRED if in the input table is NA?

  • Another mistake. It’s been fixed

3 answers

3


I believe that this code answers the question, but we need to pay attention to the following: in the desired result, which is in the question, the line ALFREDO has the spine b equal to SICRED when in the input table is NA. So this code keeps the value NA in the result.

res <- lapply(split(dados, dados$cidade), zoo::na.locf)
res <- lapply(res, zoo::na.locf, fromLast = TRUE)
res <- do.call(rbind, res)
res <- res[!duplicated(res), ]
row.names(res) <- NULL
res
#      cidade              a      b      c
#1 AGRONOMICA          CRESO SICOOB   <NA>
#2    ALFREDO CREDIVERTENTES   <NA>   <NA>
#3  ALMIRANTE        SICOPER SICRED   <NA>
#4       ALTO        SICOPER SICOOB SICRED

Explanation.
Step by step, the above code works as follows.

  1. First of all uses the split to split the input date.frame by cidade.
  2. Next, apply the function na.locf package zoo each sub-df to carry the previous value other than NA forward.
  3. Now it does the same, but taking the value not NA posterior backward.
  4. Then it joins the sub-df’s with do.call/rbind.
  5. And just pick the un-duplicated lines.
  6. The result has line names, to number them consecutively just assign them the value NULL.

DICE.

dados <-
structure(list(cidade = c("AGRONOMICA", "AGRONOMICA", "ALFREDO", 
"ALMIRANTE", "ALMIRANTE", "ALTO", "ALTO", "ALTO"), a = c("CRESO", 
NA, "CREDIVERTENTES", "SICOPER", NA, "SICOPER", NA, NA), b = c(NA, 
"SICOOB", NA, NA, "SICRED", NA, "SICOOB", NA), c = c(NA, NA, 
NA, NA, NA, NA, NA, "SICRED")), .Names = c("cidade", "a", "b", 
"c"), class = "data.frame", row.names = c(NA, -8L))

0

No need for anything but the police:

dados %>%
  group_by(cidade) %>%
  summarise_all(function(x) {
    res <- x[!is.na(x)]
    ifelse(length(res) == 0, NA_character_, res)
  })

0

The purrrlyr package makes @Rui’s excellent solution simpler:

library(dplyr)
library(purrrlyr)
library(zoo)

dados %>% 
  group_by(cidade) %>% 
  dmap(na.locf) %>% 
  distinct(cidade,.keep_all=T)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.