Equivalent to Excel SOMASES, in R

Question

Equivalent to Excel SOMASES, in R

Asked 8 years, 10 months ago

Viewed 1,339 times

12

I am deepening in R and looking for a function that is equivalent to the sums of excel.

I have this following DF:

Day.of.Week Hour    Quantidade
5            21         5044
3            12         5024
1            11         4725
0            16         4643
1            15         4601
4            14         4566
2            20         4509
1            10         4502
3            13         4477
6            11         4449

I need him to do the equivalent of Somases to mount something like

Hour 0  1  2  3  4  5  6
0    -  -  -  -  -  -  -
1    -  -  -  -  -  -  -
2    -  -  -  -  -  -  -
3    -  -  -  -  -  -  -
4    -  -  -  -  -  -  -
5    -  -  -  -  -  -  -
6    -  -  -  -  -  -  -
7    -  -  -  -  -  -  -
8    -  -  -  -  -  -  -
9    -  -  -  -  -  -  -
10   -  -  -  -  -  -  -
...

It would add up the amount according to the day of the week and the time.

4 answers

9

This can be done in many ways, as can be seen by the various responses. A way without using add packages is to use tapply:

> tapply(df$Quantidade, list(df$Day.of.Week, df$Hour), sum)
    10   11   12   13   14   15   16   20   21
0   NA   NA   NA   NA   NA   NA 4643   NA   NA
1 4502 4725   NA   NA   NA 4601   NA   NA   NA
2   NA   NA   NA   NA   NA   NA   NA 4509   NA
3   NA   NA 5024 4477   NA   NA   NA   NA   NA
4   NA   NA   NA   NA 4566   NA   NA   NA   NA
5   NA   NA   NA   NA   NA   NA   NA   NA 5044
6   NA 4449   NA   NA   NA   NA   NA   NA   NA

What looks like rows or columns depends on the order of the list passed as the second argument. If the order is reversed, the result is transposed:

> tapply(df$Quantidade, list(df$Hour, df$Day.of.Week), sum)
      0    1    2    3    4    5    6
10   NA 4502   NA   NA   NA   NA   NA
11   NA 4725   NA   NA   NA   NA 4449
12   NA   NA   NA 5024   NA   NA   NA
13   NA   NA   NA 4477   NA   NA   NA
14   NA   NA   NA   NA 4566   NA   NA
15   NA 4601   NA   NA   NA   NA   NA
16 4643   NA   NA   NA   NA   NA   NA
20   NA   NA 4509   NA   NA   NA   NA
21   NA   NA   NA   NA   NA 5044   NA

2

Very elegant code.

– André Muta

2016/03/30 at 01:10
Sensational guy, thank you so much!

– Felipe Amaral Rodrigues

2016/03/31 at 20:44

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by André Muta • **308** points · Answer 1 · 2016-03-29T21:58:29+00:00

To do this you can use two R functions: aggregate to add the "duplicates", so that you have only one day/hour case for each quantity and the function acast library reshape2 to create the matrix the way you’re asking.

Your date.frame:

df <- data.frame(Day.of.Week = c(5,3,1,0,1,4,2,1,3,6), 
                 Hour = c(21,12,11,16,15,14,20,10,13,11), 
                 Quantidade = c(5044,5024,4725,4643,4601,4566,4509,4502,4477,4449))

Code:

require(reshape2)

df <- aggregate(Quantidade ~ ., df, sum)
somases <- acast(df, Hour ~ Day.of.Week, value.var = "Quantidade")

Output:

> somases
      0    1    2    3    4    5    6
10   NA 4502   NA   NA   NA   NA   NA
11   NA 4725   NA   NA   NA   NA 4449
12   NA   NA   NA 5024   NA   NA   NA
13   NA   NA   NA 4477   NA   NA   NA
14   NA   NA   NA   NA 4566   NA   NA
15   NA 4601   NA   NA   NA   NA   NA
16 4643   NA   NA   NA   NA   NA   NA
20   NA   NA 4509   NA   NA   NA   NA
21   NA   NA   NA   NA   NA 5044   NA

In this case Als represent cases where there is no value.

A question related in English.

by Daniel Falbel • **12,504** points · Answer 2 · 2016-03-30T18:21:13+00:00

The way I like it the most is the following, using the packages dplyrand tidyr that are specialized for transformations in data.frames:

library(dplyr)
library(tidyr)
df %>% group_by(Day.of.Week, Hour) %>%
  summarise(Quantidade = sum(Quantidade)) %>%
  spread(Day.of.Week, Quantidade, fill = 0)

Source: local data frame [9 x 8]

   Hour     0     1     2     3     4     5     6
  (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1    10     0  4502     0     0     0     0     0
2    11     0  4725     0     0     0     0  4449
3    12     0     0     0  5024     0     0     0
4    13     0     0     0  4477     0     0     0
5    14     0     0     0     0  4566     0     0
6    15     0  4601     0     0     0     0     0
7    16  4643     0     0     0     0     0     0
8    20     0     0  4509     0     0     0     0
9    21     0     0     0     0     0  5044     0

If you do not have installed packages you will need to use: install.packages(c("dplyr", "tidyr")) to install.

For me the advantages of doing so are:

the result remains a data.frame. In the replies of @Molx and @André Mutathe result is an unnamed matrix of the dimensions..
there is no need to specify a line for each day of the week. In the @carlosfigueira reply it is necessary to do "hard code" of the days of the week.
it is possible to use the argument fill of function spread to automatically fill the empty boxes with some value. in this case, I believe it should not be NA and yes 0.

by carlosfigueira • **7,785** points · Answer 3 · 2016-03-29T21:16:43+00:00

You can use a value manipulation library such as dplyr to sum the quantities based on day of the week / hour, and then manipulate the result to get the format you want. The code below shows an example of how this can be done.

dow <- sample(0:6, 1000, replace = TRUE)
h <- sample(0:23, 1000, replace = TRUE)
q <- floor(runif(1000, 100, 10000))
df <- data.frame(Day.of.Week = dow, Hour = h, Quantidade = q)

library(dplyr)
somas <- df %>%
    group_by(Day.of.Week, Hour) %>%
    summarize(soma = sum(Quantidade))

perDay <- split(somas, somas$Day.of.Week)
result <- data.frame(Hour = perDay[[1]]$Hour,
                     Su = perDay[[1]]$soma,
                     Mo = perDay[[2]]$soma,
                     Tu = perDay[[3]]$soma,
                     We = perDay[[4]]$soma,
                     Th = perDay[[5]]$soma,
                     Fr = perDay[[6]]$soma,
                     Sa = perDay[[7]]$soma)