R - How to generalize a function if data is missing.frame

Asked

Viewed 53 times

1

I created the function below to calculate the descriptive statistics of certain companies separated between the 5 regions of Brazil and the total and present in a table.

library(tidyverse)
library(kableExtra)

teste_medias_regiao <- function(base,var_conta,var_regiao){

  var_conta <- enquo(var_conta)
  var_regiao <- enquo(var_regiao)
  base <- base

  resultado_reg <- base %>% group_by(!!var_regiao) %>% summarise(Minímo = min(!!var_conta),
                                                                 Média = mean(!!var_conta),
                                                                 Mediana = median(!!var_conta),
                                                                 Máximo = max(!!var_conta),
                                                                 Variância =  var(!!var_conta),
                                                                 Desv.Pad. =  sd(!!var_conta)
                                                                 ) 

  resultado_br <- base %>% summarise(regiao = "Brasil", 
                                     Minímo = min(!!var_conta),
                                     Média = mean(!!var_conta),
                                     Mediana = median(!!var_conta),
                                     Máximo = max(!!var_conta),
                                     Variância =  var(!!var_conta),
                                     Desv.Pad. =  sd(!!var_conta)
                                     )

  resultado <- merge(resultado_br,resultado_reg, all=TRUE); remove(resultado_reg,resultado_br)

  resultado <- as.data.frame(t(resultado))
  resultado <- resultado %>% mutate(`Teste/Região`=row.names(.)) %>% select(`Teste/Região`,everything())

  names(resultado)[2:7] <- c("Brasil","C-Oeste","Nordeste", "Norte", "Sudeste","Sul")  
  resultado <- resultado[-1,]

  rownames(resultado) <- NULL

  resultado %>%  kable("markdown", escape = F) %>%
                 kable_styling("striped", full_width = F)

}



teste_medias_regiao(empresa_pl, CircRLP,regiao)

Problem: When the data.frame does not have a company in one of the 5 regions, the function breaks names(resultado)[2:7].

Any suggestions how to resolve this? And fill the Region column with N/A or 0?

Basic example with all regions:

Structure(list(Circrlp = c(26240195.62, 136394073.76, 520685437.41, 141563722.92, 1116797.53, 6944476.06, 826787775.92, 61622254.35, 418733960.49, 3830627358.88), region = c("Northeast", "Southeast", "South", "Midwest", "Southeast", "Southeast", "Southeast", "North", "Southeast", "Southeast")), Row.Names = c(NA, 10L), class = "data.frame")

Base example missing a region:

Structure(list(Circrlp = c(26240195.62, 136394073.76, 520685437.41, 141563722.92, 1116797.53, 6944476.06, 826787775.92, 61622254.35, 418733960.49, 3830627358.88), region = c("Northeast", "Southeast", "Northeast", "Midwest", "Southeast", "Southeast", "Southeast", "Northeast", "Southeast", "Southeast")), Row.Names = c(NA, 10L), class = "data frame.")

  • 1

    Try to use the argument na.rm = TRUE, it will cause the function to ignore missing values.

  • 1

    Good morning, I don’t know if you’ve solved it yet. The error is probably due to the difference in the number of columns between the array of titles and results. It would be good to see the error message. If I understood well the situation you could already assign the title in summarise according to, because there would already be according to the existing data, eliminating the need for that line where the error occurs.

  • Ola @Yvescavalcanti, thanks for commenting! Where the error was appearing, I resolved as follows: names(resultado)[2:ncol(resultado)] <- resultado[1,2:ncol(resultado)], then created an empty table with all regions and name of rows. Then assign the values of the calculated table (result) to it.

1 answer

1


Where the error was appearing, I solved as follows: names(resultado)[2:ncol(resultado)] <- resultado[1,2:ncol(resultado)], then created an empty table with all regions and name of rows. Then assign the values of the calculated table (result) to it

It was something like:

names(resultado)[2:ncol(resultado)] <- resultado[1,2:ncol(resultado)]
  resultado <- resultado[-1,]
  
  tabela_completa <- data.frame(Teste=character(),
                           Brasil=character(),
                           `Centro-Oeste`=character(),
                           Nordeste=character(), 
                           Norte=character(), 
                           Sudeste=character(),
                           Sul=character(), stringsAsFactors = FALSE)
 
 
 tabela_completa <- merge(tabela_completa,resultado,all=TRUE)
 remove(resultado_reg,resultado_br,resultado)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.