Date manipulation using dplyr and lubridate

Asked

Viewed 46 times

-1

I have a date frame whose columns have dates (%Y/%m/%d), times and averages per hour over 4 months (01/01/2020 - 01/04/2020). I wonder how I could calculate the average of these hourly values, for each day, by making use of the Perator pipe (%>%) or otherwise faster. Look at my code below:

library(tidyverse)
library(lubridate)

head(dados)
       Data  Hora              Nome.Parâmetro Unidade.Medida Média.Horária
1 2020-01-03 01:00 MP10 (Partículas Inaláveis)          µg/m3            12
2 2020-01-03 02:00 MP10 (Partículas Inaláveis)          µg/m3            13
3 2020-01-03 03:00 MP10 (Partículas Inaláveis)          µg/m3             4
4 2020-01-03 04:00 MP10 (Partículas Inaláveis)          µg/m3             7
5 2020-01-03 05:00 MP10 (Partículas Inaláveis)          µg/m3            16
6 2020-01-03 06:00 MP10 (Partículas Inaláveis)          µg/m3            11   

I executed the following command:

head(dados %>% 
  group_by(Data) %>% 
  summarise(med_dia = mean(dados$Média.Horária))
)

Data           med_dia
<date>          <dbl>
1 2020-01-03    22.8
2 2020-01-04    22.8
3 2020-01-05    22.8
4 2020-01-06    22.8
5 2020-01-07    22.8
6 2020-01-14    22.8

After executing the above code, I expected the calculation of hourly averages per day. However, the command sums all columns indiscriminately and repeats the value on all rows.

  • Instead of mean(dados$Média.Horária) try removing the base name, mean(Média.Horária).

  • To have the data in an easier way to copy to an R session, can you please, edit the question with the departure of dput(dados) or, if the base is too large, dput(head(dados, 20))?

1 answer

0

# Dados de exemplo
set.seed(4)
dados <- data.frame(Data = rep(paste0('2020-01-0', 1:3), each = 4),
                    Média.Horária = sample(1:20, 12, TRUE))

Using dplyr

As pointed out by @Rui-Arradas, when using group_by... mean(dados$Média.Horária) you are indicating to average the whole vector and apply this value to the groups. Simply enter the column name:

library(dplyr)

> dados %>% group_by(Data) %>% summarise(med_dia = mean(Média.Horária))
# A tibble: 3 x 2
  Data       med_dia
  <fct>        <dbl>
1 2020-01-01   14.2 
2 2020-01-02    9.75
3 2020-01-03   15.5 

Using data.table

library(data.table)

setDT(dados)

> dados[, .(med_dia = mean(Média.Horária)), by = Data]
         Data med_dia
1: 2020-01-01   14.25
2: 2020-01-02    9.75
3: 2020-01-03   15.50

Browser other questions tagged

You are not signed in. Login or sign up in order to post.