1
I wanted to use the dplyr package to calculate the Relative Frequency by group. I have a database like the first three columns below and I would like the last column to be the answer variable:
CNPJ Central depositos Resultado final
315406 SICOOB CECRESP 4,61E+13 97,78%
512839 SICOOB CECRESP 1,05E+12 2,22%
68987 SICOOB CREDIMINAS 5,22E+13 33,00%
429890 SICOOB CREDIMINAS 3,88E+13 24,54%
803287 SICOOB CREDIMINAS 3,82E+13 24,15%
804046 SICOOB CREDIMINAS 2,90E+13 18,31%
694877 SICOOB PLANALTO CENTRAL 5,01E+13 100,00%
694389 SICOOB SC/RS 8,75E+13 67,28%
707903 SICOOB SC/RS 4,25E+13 32,72%
Any suggestions? I don’t know much about the dplyr package but I made some frustrated attempts like:
dados <- dados %>%
group_by(CENTRAL, depositos) %>%
summarise(value = sum(value)) %>%
mutate(csum = cumsum(value))
And the Relative Frequency Accumulated by CENTRAL?
Rafael, what would you consider to do the Cumulative Relative Frequency?
– T. Veiga
Hi Veiga, I would follow in the pipe and use the
cumsum
:mutate(freq_cum=cumsum(value)/sum(value))
.– Rafael Toledo