Quantitative position

Asked

Viewed 55 times

1

I need to calculate the percentage of days that are in a certain range of values in the analyzed column (values are in percentage) in relation to the total quantity of days for that month.

Initially I survey how many days each month has, I do this mainly due to the month of February that may vary:

qtd_dias_mes <- dados %>% 
  group_by(mes) %>% 
  select(analise) %>%
  dplyr::summarise(n()) %>%          
  dplyr::rename("qtd_dias_mes" = "n()")
  View(qtd_dias_mes)

Now I survey how many days are between the interval that is greater than 75 and less than 100:

analise_75_e_100 <- dados %>%
filter(dados$analise > 75 & dados$analise < 100) %>% 
  group_by(mes) %>%
  dplyr::summarise(n())

the values delivered by dplyr::summarise(n()) do not contemplate month 6 (with the complete dataframe), and when making the calculation I need (below), end up appearing month 6 (which should not) and wrong values from it, and should appear 0 for month 6:

porc_dias_entre_75_e_100 = (analise_75_e_100$`n()` / qtd_dias_mes$qtd_dias_mes) * 100

In order to simplify my question here, I presented above the calculation with this interval, however I will calculate for different percentage intervals, and may appear less not contemplated, and should then be filled with zero. What alternatives for these calculations to be correct?

My data have 10957 lines (1/1/1990 to 12/31/2019), maybe making with the sample below change the month 6 that I exemplified above, however the reasoning is the same.

> dput(head(dados, 200))
structure(list(data = c("1/1/1990", "1/2/1990", "1/3/1990", "1/4/1990", 
"1/5/1990", "1/6/1990", "1/7/1990", "1/8/1990", "1/9/1990", "1/10/1990", 
"1/11/1990", "1/12/1990", "1/13/1990", "1/14/1990", "1/15/1990", 
"1/16/1990", "1/17/1990", "1/18/1990", "1/19/1990", "1/20/1990", 
"1/21/1990", "1/22/1990", "1/23/1990", "1/24/1990", "1/25/1990", 
"1/26/1990", "1/27/1990", "1/28/1990", "1/29/1990", "1/30/1990", 
"1/31/1990", "2/1/1990", "2/2/1990", "2/3/1990", "2/4/1990", 
"2/5/1990", "2/6/1990", "2/7/1990", "2/8/1990", "2/9/1990", "2/10/1990", 
"2/11/1990", "2/12/1990", "2/13/1990", "2/14/1990", "2/15/1990", 
"2/16/1990", "2/17/1990", "2/18/1990", "2/19/1990", "2/20/1990", 
"2/21/1990", "2/22/1990", "2/23/1990", "2/24/1990", "2/25/1990", 
"2/26/1990", "2/27/1990", "2/28/1990", "3/1/1990", "3/2/1990", 
"3/3/1990", "3/4/1990", "3/5/1990", "3/6/1990", "3/7/1990", "3/8/1990", 
"3/9/1990", "3/10/1990", "3/11/1990", "3/12/1990", "3/13/1990", 
"3/14/1990", "3/15/1990", "3/16/1990", "3/17/1990", "3/18/1990", 
"3/19/1990", "3/20/1990", "3/21/1990", "3/22/1990", "3/23/1990", 
"3/24/1990", "3/25/1990", "3/26/1990", "3/27/1990", "3/28/1990", 
"3/29/1990", "3/30/1990", "3/31/1990", "4/1/1990", "4/2/1990", 
"4/3/1990", "4/4/1990", "4/5/1990", "4/6/1990", "4/7/1990", "4/8/1990", 
"4/9/1990", "4/10/1990", "4/11/1990", "4/12/1990", "4/13/1990", 
"4/14/1990", "4/15/1990", "4/16/1990", "4/17/1990", "4/18/1990", 
"4/19/1990", "4/20/1990", "4/21/1990", "4/22/1990", "4/23/1990", 
"4/24/1990", "4/25/1990", "4/26/1990", "4/27/1990", "4/28/1990", 
"4/29/1990", "4/30/1990", "5/1/1990", "5/2/1990", "5/3/1990", 
"5/4/1990", "5/5/1990", "5/6/1990", "5/7/1990", "5/8/1990", "5/9/1990", 
"5/10/1990", "5/11/1990", "5/12/1990", "5/13/1990", "5/14/1990", 
"5/15/1990", "5/16/1990", "5/17/1990", "5/18/1990", "5/19/1990", 
"5/20/1990", "5/21/1990", "5/22/1990", "5/23/1990", "5/24/1990", 
"5/25/1990", "5/26/1990", "5/27/1990", "5/28/1990", "5/29/1990", 
"5/30/1990", "5/31/1990", "6/1/1990", "6/2/1990", "6/3/1990", 
"6/4/1990", "6/5/1990", "6/6/1990", "6/7/1990", "6/8/1990", "6/9/1990", 
"6/10/1990", "6/11/1990", "6/12/1990", "6/13/1990", "6/14/1990", 
"6/15/1990", "6/16/1990", "6/17/1990", "6/18/1990", "6/19/1990", 
"6/20/1990", "6/21/1990", "6/22/1990", "6/23/1990", "6/24/1990", 
"6/25/1990", "6/26/1990", "6/27/1990", "6/28/1990", "6/29/1990", 
"6/30/1990", "7/1/1990", "7/2/1990", "7/3/1990", "7/4/1990", 
"7/5/1990", "7/6/1990", "7/7/1990", "7/8/1990", "7/9/1990", "7/10/1990", 
"7/11/1990", "7/12/1990", "7/13/1990", "7/14/1990", "7/15/1990", 
"7/16/1990", "7/17/1990", "7/18/1990", "7/19/1990"), dia_da_semana = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 1L, 2L, 3L, 4L), mes = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L), ano = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L), analise = c(0, 0, 0, 0, 100, 100, 
100, 100, 100, 100, 100, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 100, 56.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
100, 100, 100, 100, 100, 100, 36.23, 0, 0, 100, 100, 100, 100, 
100, 100, 100, 100, 100, 100, 59.74, 0, 0, 0, 0, 100, 78, 0, 
0, 0, 0, 0, 0, 0, 0, 100, 100, 100, 66.46, 100, 56.01, 0, 0, 
0, 0, 100, 100, 72.23, 0, 0, 0, 0, 0, 100, 100, 100, 100, 78, 
100, 100, 100, 44.46, 100, 100, 100, 100, 100, 100, 100, 59.74, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 56.01, 0, 0, 0, 0, 100, 
100, 100, 54.97, 0, 0, 0, 0, 100, 56.01, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 100, 100, 75.35, 0, 0, 0)), row.names = c(NA, 
200L), class = "data.frame")
  • When you say month, means year and month?

1 answer

2


Here are two solutions, one for calculating the percentages of values per year/month and the other only per month.

Per year/month

Lines are counted with count twice, without and with the column filter analise be between 75 and 100. And these two results are joined with left_join. Then just split the count columns.

library(dplyr)

left_join(
  dados %>% count(ano, mes),
  dados %>% filter(analise > 75 & analise < 100) %>% count(ano, mes),
  by = c("ano", "mes"),
  suffix = c(".total", ".entre_75_e_100")
) %>%
  mutate(porcentagem = 100*n.entre_75_e_100/n.total)
#   ano mes n.total n.entre_75_e_100 porcentagem
#1 1990   1      31                1    3.225806
#2 1990   2      28               NA          NA
#3 1990   3      31                1    3.225806
#4 1990   4      30                1    3.333333
#5 1990   5      31               NA          NA
#6 1990   6      30               NA          NA
#7 1990   7      19                1    5.263158

Only per month

Simply remove the column ano of the above code.

left_join(
  dados %>% count(mes),
  dados %>% filter(analise > 75 & analise < 100) %>% count(mes),
  by = "mes",
  suffix = c(".total", ".entre_75_e_100")
) %>%
  mutate(porcentagem = 100*n.entre_75_e_100/n.total)
#  mes n.total n.entre_75_e_100 porcentagem
#1   1      31                1    3.225806
#2   2      28               NA          NA
#3   3      31                1    3.225806
#4   4      30                1    3.333333
#5   5      31               NA          NA
#6   6      30               NA          NA
#7   7      19                1    5.263158

Note

In both cases, if lines without data are not accurate (there is no analise between the desired values) just use inner_join instead of left_join.

  • Rui Barradas thanks for the answer! It took me a long time to return because it wasn’t working initially, but then it worked. I just didn’t understand the final note you put in. " In both cases, if lines without data are not accurate (there is no analysis between the desired values) just use inner_join instead of left_join. What do you mean if "they’re not needed"?

  • @Bruna The 2nd, 5th and 6th lines have NA's, you want those lines? If you don’t want them, use the inner_join.

  • I had understood precise with a sense of "accuracy". Thank you!!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.