1
I created the function below to calculate the descriptive statistics of certain companies separated between the 5 regions of Brazil and the total and present in a table.
library(tidyverse)
library(kableExtra)
teste_medias_regiao <- function(base,var_conta,var_regiao){
var_conta <- enquo(var_conta)
var_regiao <- enquo(var_regiao)
base <- base
resultado_reg <- base %>% group_by(!!var_regiao) %>% summarise(Minímo = min(!!var_conta),
Média = mean(!!var_conta),
Mediana = median(!!var_conta),
Máximo = max(!!var_conta),
Variância = var(!!var_conta),
Desv.Pad. = sd(!!var_conta)
)
resultado_br <- base %>% summarise(regiao = "Brasil",
Minímo = min(!!var_conta),
Média = mean(!!var_conta),
Mediana = median(!!var_conta),
Máximo = max(!!var_conta),
Variância = var(!!var_conta),
Desv.Pad. = sd(!!var_conta)
)
resultado <- merge(resultado_br,resultado_reg, all=TRUE); remove(resultado_reg,resultado_br)
resultado <- as.data.frame(t(resultado))
resultado <- resultado %>% mutate(`Teste/Região`=row.names(.)) %>% select(`Teste/Região`,everything())
names(resultado)[2:7] <- c("Brasil","C-Oeste","Nordeste", "Norte", "Sudeste","Sul")
resultado <- resultado[-1,]
rownames(resultado) <- NULL
resultado %>% kable("markdown", escape = F) %>%
kable_styling("striped", full_width = F)
}
teste_medias_regiao(empresa_pl, CircRLP,regiao)
Problem: When the data.frame does not have a company in one of the 5 regions, the function breaks names(resultado)[2:7]
.
Any suggestions how to resolve this? And fill the Region column with N/A or 0?
Basic example with all regions:
Structure(list(Circrlp = c(26240195.62, 136394073.76, 520685437.41, 141563722.92, 1116797.53, 6944476.06, 826787775.92, 61622254.35, 418733960.49, 3830627358.88), region = c("Northeast", "Southeast", "South", "Midwest", "Southeast", "Southeast", "Southeast", "North", "Southeast", "Southeast")), Row.Names = c(NA, 10L), class = "data.frame")
Base example missing a region:
Structure(list(Circrlp = c(26240195.62, 136394073.76, 520685437.41, 141563722.92, 1116797.53, 6944476.06, 826787775.92, 61622254.35, 418733960.49, 3830627358.88), region = c("Northeast", "Southeast", "Northeast", "Midwest", "Southeast", "Southeast", "Southeast", "Northeast", "Southeast", "Southeast")), Row.Names = c(NA, 10L), class = "data frame.")
Try to use the argument
na.rm = TRUE
, it will cause the function to ignore missing values.– Alexandre Sanches
Good morning, I don’t know if you’ve solved it yet. The error is probably due to the difference in the number of columns between the array of titles and results. It would be good to see the error message. If I understood well the situation you could already assign the title in summarise according to, because there would already be according to the existing data, eliminating the need for that line where the error occurs.
– Yves Cavalcanti
Ola @Yvescavalcanti, thanks for commenting! Where the error was appearing, I resolved as follows:
names(resultado)[2:ncol(resultado)] <- resultado[1,2:ncol(resultado)]
, then created an empty table with all regions and name of rows. Then assign the values of the calculated table (result) to it.– RxT