Equivalent to Excel SOMASES, in R

Asked

Viewed 1,339 times

12

I am deepening in R and looking for a function that is equivalent to the sums of excel.

I have this following DF:

Day.of.Week Hour    Quantidade
5            21         5044
3            12         5024
1            11         4725
0            16         4643
1            15         4601
4            14         4566
2            20         4509
1            10         4502
3            13         4477
6            11         4449

I need him to do the equivalent of Somases to mount something like

Hour 0  1  2  3  4  5  6
0    -  -  -  -  -  -  -
1    -  -  -  -  -  -  -
2    -  -  -  -  -  -  -
3    -  -  -  -  -  -  -
4    -  -  -  -  -  -  -
5    -  -  -  -  -  -  -
6    -  -  -  -  -  -  -
7    -  -  -  -  -  -  -
8    -  -  -  -  -  -  -
9    -  -  -  -  -  -  -
10   -  -  -  -  -  -  -
...

It would add up the amount according to the day of the week and the time.

4 answers

9


This can be done in many ways, as can be seen by the various responses. A way without using add packages is to use tapply:

> tapply(df$Quantidade, list(df$Day.of.Week, df$Hour), sum)
    10   11   12   13   14   15   16   20   21
0   NA   NA   NA   NA   NA   NA 4643   NA   NA
1 4502 4725   NA   NA   NA 4601   NA   NA   NA
2   NA   NA   NA   NA   NA   NA   NA 4509   NA
3   NA   NA 5024 4477   NA   NA   NA   NA   NA
4   NA   NA   NA   NA 4566   NA   NA   NA   NA
5   NA   NA   NA   NA   NA   NA   NA   NA 5044
6   NA 4449   NA   NA   NA   NA   NA   NA   NA

What looks like rows or columns depends on the order of the list passed as the second argument. If the order is reversed, the result is transposed:

> tapply(df$Quantidade, list(df$Hour, df$Day.of.Week), sum)
      0    1    2    3    4    5    6
10   NA 4502   NA   NA   NA   NA   NA
11   NA 4725   NA   NA   NA   NA 4449
12   NA   NA   NA 5024   NA   NA   NA
13   NA   NA   NA 4477   NA   NA   NA
14   NA   NA   NA   NA 4566   NA   NA
15   NA 4601   NA   NA   NA   NA   NA
16 4643   NA   NA   NA   NA   NA   NA
20   NA   NA 4509   NA   NA   NA   NA
21   NA   NA   NA   NA   NA 5044   NA

6

To do this you can use two R functions: aggregate to add the "duplicates", so that you have only one day/hour case for each quantity and the function acast library reshape2 to create the matrix the way you’re asking.

Your date.frame:

df <- data.frame(Day.of.Week = c(5,3,1,0,1,4,2,1,3,6), 
                 Hour = c(21,12,11,16,15,14,20,10,13,11), 
                 Quantidade = c(5044,5024,4725,4643,4601,4566,4509,4502,4477,4449))

Code:

require(reshape2)

df <- aggregate(Quantidade ~ ., df, sum)
somases <- acast(df, Hour ~ Day.of.Week, value.var = "Quantidade")

Output:

> somases
      0    1    2    3    4    5    6
10   NA 4502   NA   NA   NA   NA   NA
11   NA 4725   NA   NA   NA   NA 4449
12   NA   NA   NA 5024   NA   NA   NA
13   NA   NA   NA 4477   NA   NA   NA
14   NA   NA   NA   NA 4566   NA   NA
15   NA 4601   NA   NA   NA   NA   NA
16 4643   NA   NA   NA   NA   NA   NA
20   NA   NA 4509   NA   NA   NA   NA
21   NA   NA   NA   NA   NA 5044   NA

In this case Als represent cases where there is no value.

A question related in English.

  • I think the aggregate it is not necessary in this your code, it only changes the order of the data frame lines and resets the Row Names.

  • It is not really necessary for this case, but if there is more than one occurrence on the same day and time the acast would not work without Aggregate.

  • Oh yes, it makes sense. It’s just that I’ve never actually used acast, had not stopped to observe what the function does.

  • Thank you very much for your help!!

5

The way I like it the most is the following, using the packages dplyrand tidyr that are specialized for transformations in data.frames:

library(dplyr)
library(tidyr)
df %>% group_by(Day.of.Week, Hour) %>%
  summarise(Quantidade = sum(Quantidade)) %>%
  spread(Day.of.Week, Quantidade, fill = 0)

Source: local data frame [9 x 8]

   Hour     0     1     2     3     4     5     6
  (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1    10     0  4502     0     0     0     0     0
2    11     0  4725     0     0     0     0  4449
3    12     0     0     0  5024     0     0     0
4    13     0     0     0  4477     0     0     0
5    14     0     0     0     0  4566     0     0
6    15     0  4601     0     0     0     0     0
7    16  4643     0     0     0     0     0     0
8    20     0     0  4509     0     0     0     0
9    21     0     0     0     0     0  5044     0

If you do not have installed packages you will need to use: install.packages(c("dplyr", "tidyr")) to install.

For me the advantages of doing so are:

  • the result remains a data.frame. In the replies of @Molx and @André Mutathe result is an unnamed matrix of the dimensions..

  • there is no need to specify a line for each day of the week. In the @carlosfigueira reply it is necessary to do "hard code" of the days of the week.

  • it is possible to use the argument fill of function spread to automatically fill the empty boxes with some value. in this case, I believe it should not be NA and yes 0.

4

You can use a value manipulation library such as dplyr to sum the quantities based on day of the week / hour, and then manipulate the result to get the format you want. The code below shows an example of how this can be done.

dow <- sample(0:6, 1000, replace = TRUE)
h <- sample(0:23, 1000, replace = TRUE)
q <- floor(runif(1000, 100, 10000))
df <- data.frame(Day.of.Week = dow, Hour = h, Quantidade = q)

library(dplyr)
somas <- df %>%
    group_by(Day.of.Week, Hour) %>%
    summarize(soma = sum(Quantidade))

perDay <- split(somas, somas$Day.of.Week)
result <- data.frame(Hour = perDay[[1]]$Hour,
                     Su = perDay[[1]]$soma,
                     Mo = perDay[[2]]$soma,
                     Tu = perDay[[3]]$soma,
                     We = perDay[[4]]$soma,
                     Th = perDay[[5]]$soma,
                     Fr = perDay[[6]]$soma,
                     Sa = perDay[[7]]$soma)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.