3
I have a set of rain data measured every hour and I need to add this data throughout the day. For this, I am using the commands below:
library(dplyr)
df %>% group_by(data) %>%
summarise_all(funs(Media=sum(., na.rm=TRUE)))
However, my df has three characteristics: i) days without NA, ii) days with some Nas, and iii) days with only NA, as an example below:
With that, if I consider na.rm=TRUE
, days that have only NA, as day 03/01/10, returns with a value of 0 after the sum. Which is wrong, because I don’t know if it really rained, since I don’t have the die. On the contrary, na.rm=FALSE
, disregard both the days that have only NA and the days that have some Nas and some measures, which is also bad.
In the output shown below, the day 02/01/2010 should present a value different from 0.0 and the day 03/01/2010 should present NA. When I calculate the average Media=mean
the result is correct, but to Media=sum
I can’t fix that.
data "rain_mm_ToT"
01/01/2010 19.7
02/01/2010 0.0
03/01/2010 0.0
04/01/2010 0.5
05/01/2010 0.0
06/01/2010 0.0
07/01/2010 6.3
08/01/2010 1.9
09/01/2010 1.4
10/01/2010 0.0
11/01/2010 0.0
Since the series is very long, I can’t make a thorough assessment of where the mistakes will be. Thus, I would like to know if there is an alternative to consider "partially" the Nas values, that is, return with NA only on days that have no measurement and perform the sum on those that have both NA and measurement?
Thank you!
Can you please, edit the question with the departure of
dput(df)
or, if the base is too large,dput(head(df, 20))
?– Rui Barradas