Name columns generated by Summarise() and use division in the same function

Asked

Viewed 56 times

0

df is the data frame with the data that I will work with

df <- data.frame(matrix(NA,nrow = 30, ncol = 6))
colnames(df) <- c("ano","mes","x1","x2","x3","x4")

df$ano <- c(2002,2002,2002,2003,2003,2003,2004,2004,2004,2004,2002,2002,2002,2003,2003,2003,2004,2004,2004,2004,2002,2002,2002,2003,2003,2003,2004,2004,2004,2004)
df$mes <- c(5,9,7,3,8,2,4,1,6,10,2,11,12,10,12,2,8,9,10,7,6,4,5,7,3,1,9,5,2,6)
df$x1 <- c(2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60)
df$x2 <- c(3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57,60,63,66,69,72,75,78,81,84,87,90)
df$x3 <- c(4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116,120)
df$x4 <- c(5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150)

View(df)

resumo is the data frame where I want my analysis to be recorded in order to export to . csv at the end

resumo <- data.frame(matrix(NA,nrow = 12, ncol = 7))
colnames(resumo) <- c("mes","media_diaria_X1","media_mensal_X1","media_diaria_X4","media_mensal_X4","media_diaria_x3_divido_x4","media_mensal_x3_divido_x4")
resumo$mes <- c(1,2,3,4,5,6,7,8,9,10,11,12)

View(resumo)

library(dplyr)

When I use the dplyr package it returns me values with the columns named equal to the columns of the data frame df as the selection is performed by the columns of df. The big question is that I would like the results obtained below to be recorded in the data frame resumo with the columns named according to what I defined above .

Below are the calculations I managed to do and I intend to do with the help of someone:

X1 and X4 Daily Averages Calculation

media_diaria <- df %>%
  group_by(mes) %>%
  summarise_at(vars(x1,x4), mean, na.rm = TRUE)

dput(media_diaria)
structure(list(mes = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 
    x1 = c(34, 31, 29, 29, 34.6666666666667, 40, 31.3333333333333, 
    22, 31.3333333333333, 28.6666666666667, 24, 28), x4 = c(85, 
    77.5, 72.5, 72.5, 86.6666666666667, 100, 78.3333333333333, 
    55, 78.3333333333333, 71.6666666666667, 60, 70)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

Calculation of monthly averages (referring to monthly sum) of X1 and X4

soma_mensal <- df %>% 
  group_by(ano,mes) %>%
  summarise_at(vars(x1,x4), sum, na.rm = TRUE)
media_mensal <- soma_mensal %>% 
  group_by(mes) %>% 
  summarise_at(vars(x1,x4), mean, na.rm = TRUE)

dput(soma_mensal)
structure(list(ano = c(2002, 2002, 2002, 2002, 2002, 2002, 2002, 
2002, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2004, 2004, 2004, 
2004, 2004, 2004, 2004, 2004, 2004), mes = c(2, 4, 5, 6, 7, 9, 
11, 12, 1, 2, 3, 7, 8, 10, 12, 1, 2, 4, 5, 6, 7, 8, 9, 10), x1 = c(22, 
44, 48, 42, 6, 4, 24, 26, 52, 44, 58, 48, 10, 28, 30, 16, 58, 
14, 56, 78, 40, 34, 90, 58), x4 = c(55, 110, 120, 105, 15, 10, 
60, 65, 130, 110, 145, 120, 25, 70, 75, 40, 145, 35, 140, 195, 
100, 85, 225, 145)), row.names = c(NA, -24L), groups = structure(list(
    ano = c(2002, 2003, 2004), .rows = structure(list(1:8, 9:15, 
        16:24), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

dput(media_mensal)
structure(list(mes = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 
    x1 = c(34, 41.3333333333333, 58, 29, 52, 60, 31.3333333333333, 
    22, 47, 43, 24, 28), x4 = c(85, 103.333333333333, 145, 72.5, 
    130, 150, 78.3333333333333, 55, 117.5, 107.5, 60, 70)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

From that point on the calculations didn’t work out and I can’t solve.

Calculations of the daily average with the result obtained from the divorce of X3 by X4. Did not work!

media_diaria_divisao <- df %>%
  group_by(mes) %>%
  summarise_at((df$x3/df$x4), mean, na.rm = TRUE)

Error: `.vars` must be a character/numeric vector or a `vars()` object, not a double vector.
Run `rlang::last_error()` to see where the error occurred.

Calculation of monthly averages (referring to monthly sum) of X1 and X4. Also failed!

soma_mensal <- df %>% 
  group_by(ano,mes) %>%
  summarise_at((df$x3/df$x4), sum, na.rm = TRUE)
media_mensal <- soma_mensal %>% 
  group_by(mes) %>% 
  summarise_at((df$x3/df$x4), mean, na.rm = TRUE)

Error: `.vars` must be a character/numeric vector or a `vars()` object, not a double vector.
Run `rlang::last_error()` to see where the error occurred.

Summing up my two doubts:

  1. How I can perform the above 2 analyses in relation to the X3/X4 division with the dplyr? If using the dplyr not be a good option, what would be a possible solution using functions? I’m trying to avoid working with loops.

  2. How do I feed the dataframe resumo with the results of the above analyses?

  • I didn’t understand the difference between media_diaria and media_mensal. The results are equal, at least in the example provided.

  • Marcus Nunes considering the X1 column for example: for the calculation of media_diary, a grouping was performed with all the data referring to the month of January, for example, which is represented by the number 1 in the dataframe, and then I averaged. For the monthly media_first, a sum of the values of each month in relation to its year was performed and then the average of this sum was performed. January/2002 had a summation, January/2003 had another summation and so on. Then I performed the average of this sum and obtained the monthly medium.

  • Images are hard to see, post values and format with 4 margin spaces, please. As for the code, you will not be looking for summarise(med_dia_divisao = mean(x3/x4, na.rm = TRUE)) in the first that did not work? In the others the solution should be similar.

  • Rui Barradas removed the images and placed dput of each of the data frames. As for the suggestion you presented me, it did not work, I had tried this way before, but it returns me a single value for all months (value=0.8), and it should be a specific value for each month. Marcus Nunes, maybe with the dput that I added, the visualization is better. As there are few values, in some cases media_diaria and media_monthly are coinciding, but if you observe the month 10 (October) already has difference between them.

1 answer

2


Uses `mutate;

media_diaria_divisao <- df %>% mutate(x3_x4=x3/x4) %>%
  group_by(mes) %>% 
  summarise_at(vars(x1,x4,x3_x4), mean, na.rm = TRUE)

soma_mensal_div <- df %>% mutate(x3_x4=x3/x4) %>%
  group_by(ano,mes) %>%
  summarise_at(vars(x1,x4,x3_x4), sum, na.rm = TRUE)
media_mensal_div <- soma_mensal_div %>% 
  group_by(mes) %>% 
  summarise_at(vars(x1,x4,x3_x4), mean, na.rm = TRUE)

rtemp=merge(media_diaria_divisao,media_mensal_div,by="mes")
resumo[]=rtemp[,c(1,2,5,3,6,4,7)]


#> resumo
#   mes media_diaria_X1 media_mensal_X1 media_diaria_X4 media_mensal_X4 media_diaria_x3_divido_x4 media_mensal_x3_divido_x4
#1    1        34.00000        34.00000        85.00000        85.00000                       0.8                  0.800000
#2    2        31.00000        41.33333        77.50000       103.33333                       0.8                  1.066667
#3    3        29.00000        58.00000        72.50000       145.00000                       0.8                  1.600000
#4    4        29.00000        29.00000        72.50000        72.50000                       0.8                  0.800000
#5    5        34.66667        52.00000        86.66667       130.00000                       0.8                  1.200000
#6    6        40.00000        60.00000       100.00000       150.00000                       0.8                  1.200000
#7    7        31.33333        31.33333        78.33333        78.33333                       0.8                  0.800000
#8    8        22.00000        22.00000        55.00000        55.00000                       0.8                  0.800000
#9    9        31.33333        47.00000        78.33333       117.50000                       0.8                  1.200000
#10  10        28.66667        43.00000        71.66667       107.50000                       0.8                  1.200000
#11  11        24.00000        24.00000        60.00000        60.00000                       0.8                  0.800000
#12  12        28.00000        28.00000        70.00000        70.00000                       0.8                  0.800000
  • 1

    Robert thank you so much! It worked! Only the media_diaria_x3_divido_x4 that was a little different! But I believe that it is a matter of interpretation, the essential part that was the use of the functions together has already clarified me enough!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.