Aggregate function on R

Asked

Viewed 3,012 times

4

Good afternoon. I’m using the function aggregate to group some data. However, I am only using one variable to add up. I would like to use more than one variable. Is this possible? I am using the following example:

TESTE = aggregate(VALOR ~ REFERENCIA + GRUPO_COPA + CIDADE, data=DADOS,FUN=sum)

I would like to use variable QTDE next to VALOR to add, that is, add one more column, with the following columns:

REFERENCIA, GRUPO_COPA, CIDADE, VALOR, QTDE

Is it possible in Aggregate or other function this example? Grateful.

Edit

Check out my example using dput (DATA):

structure(list(REFERENCIA = c("JAN_2017", "JAN_2017", "JAN_2017", "JAN_2017", "FEV_2017", "FEV_2017", "FEV_2017", "FEV_2017", "FEV_2017" ), GRUPO_COPA = c("AZUL", "AZUL", "AMARELO", "AMARELO", "VERDE", "VERDE", "VERDE", "AZUL", "AZUL"), CIDADE = c("SP", "SP", "SP", "SP", "RJ", "BSB", "BSB", "BSB", "SP"), VALOR = c(1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000), QTDE = c(1, 3, 5, 7, 9, 11, 13, 15, 17)), .Names = c("REFERENCIA", "GRUPO_COPA", "CIDADE", "VALOR", "QTDE"), row.names = c(NA, 9L), class = "data.frame")

I would like you to group this dataset (similar to Aggregate or similar) together by adding the columns VALUE and QTDE.

  • 2

    A tip: it is much easier to get help here by providing a data set. It is not necessary to be your complete original set. It can only be a part of it. To share your dataset, run dput(DADOS) and paste the result into the body of the question.

3 answers

6

I suggest you use the package dplyr to do this kind of operation. Here’s an example of usage that would solve your problem:

library(dplyr)

x <- mtcars %>%
  group_by(cyl, vs, am) %>%
  summarise(
    valor = sum(mpg),
    qtd = n()
  )

Within the function group_by you indicate which are the variables by which you want to make the aggregations (in your case would be REFERENCIA + GRUPO_COPA + CIDADE). In function summarise you indicate the account you need to do to aggregate. In particular, the function n() returns the line count, which is what you wanted to calculate.

A good reference to learn more about dplyr is the book R for Data Science, chiefly this chapter.

  • Good morning. Unfortunately it didn’t work. A message appears saying that the number of lines should not exceed 32. You would have another suggestion?

  • maybe because dplyr syntax is not REFERENCIA + GRUPO_COPA + CIDADE, but group_by(REFERENCIA, GRUPO_COPA, CIDADE). As @Marcus Nunes said, dput(DADOS) would help us help you.

  • And to subtract instead of adding up ?

2


A solution using aggregate() is to inform . in the left part of the formula:

dados <- structure(list(REFERENCIA = c("JAN_2017", "JAN_2017", "JAN_2017", "JAN_2017", "FEV_2017", "FEV_2017", "FEV_2017", "FEV_2017", "FEV_2017" ), GRUPO_COPA = c("AZUL", "AZUL", "AMARELO", "AMARELO", "VERDE", "VERDE", "VERDE", "AZUL", "AZUL"), CIDADE = c("SP", "SP", "SP", "SP", "RJ", "BSB", "BSB", "BSB", "SP"), VALOR = c(1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000), QTDE = c(1, 3, 5, 7, 9, 11, 13, 15, 17)), .Names = c("REFERENCIA", "GRUPO_COPA", "CIDADE", "VALOR", "QTDE"), row.names = c(NA, 9L), class = "data.frame")
aggregate( . ~ REFERENCIA + GRUPO_COPA + CIDADE, FUN = sum, data = dados)

0

It was unclear how you wanted the result, so I made the two possible solutions for the quantity column.

Summing values according to quantities:

aggregate((VALOR * QTDE) ~ REFERENCIA + GRUPO_COPA + 
CIDADE, data=da,FUN=sum)

REFERENCIA GRUPO_COPA CIDADE (VALOR * QTDE)
1   FEV_2017       AZUL    BSB         120000
2   FEV_2017      VERDE    BSB         157000
3   FEV_2017      VERDE     RJ          45000
4   JAN_2017    AMARELO     SP          43000
5   FEV_2017       AZUL     SP         153000
6   JAN_2017       AZUL     SP           7000

Summing values according to quantities and reporting unit quantities:

aggregate(cbind(valor = VALOR * QTDE, QTDE) ~ REFERENCIA + 
GRUPO_COPA + CIDADE, data=da,FUN=sum)

REFERENCIA GRUPO_COPA CIDADE  valor QTDE
1   FEV_2017       AZUL    BSB 120000   15
2   FEV_2017      VERDE    BSB 157000   24
3   FEV_2017      VERDE     RJ  45000    9
4   JAN_2017    AMARELO     SP  43000   12
5   FEV_2017       AZUL     SP 153000   17
6   JAN_2017       AZUL     SP   7000    4

Browser other questions tagged

You are not signed in. Login or sign up in order to post.