Apply function in data groups

Question

Apply function in data groups

Asked 9 years, 9 months ago

Viewed 175 times

4

I need to separate the data into groups and perform the calculations in two or three groups/dimensions.

I found the tapply function, it solves the problem. With it I get what I need by using the average function, sum, etc.

But now, I realized that I need to homogenise the data in the selected groups, so instead of the function of being average, sum and so on, I need to create a function that homogeinize and then apply to tapply. I think my homogenisation function is in trouble, but I can’t identify what.

I tried with dplyr, data.table, agreggate following the idea of the link on the side, but all give error. How to consolidate (aggregate or group) values in a database?

Below follows the code I have:

   bairro <- c("B_FLORESTA", "B_PINHEIRAO", "B_PINHEIRAO", "B_PINHEIRINHO",
                "B_LUTHER KING", "B_LUTHER KING", "B_VILA NOVA", "B_VILA NOVA",
                "B_NOVA PETROPOLIS", "B_VILA NOVA", "B_INTERIOR", "B_ALVORADA",
                "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA",
                "B_SADIA", "B_JUPTER", "B_JUPTER", "B_FLORESTA", "B_ITALIA",
                "B_ITALIA", "B_ITALIA", "B_ITALIA")

    tipo <-   c("CASA", "CASA", "COMERCIAIS", "CASA", "CASA", "COMERCIAIS",
                "APARTAMENTO", "APARTAMENTO", "APARTAMENTO", "APARTAMENTO",
                "SITIO", "APARTAMENTO", "CASA", "CASA", "CASA", "CASA",
                "TERRENO", "TERRENO", "CASA", "CASA", "CASA", "CASA",
                "CASA", "CASA", "CASA", "CASA")

    valor <-  c(1167, 2500, 1125, 2286, 400, 400, 1500, 1500, 300, 1500, 555,
                973, 2500, 2556, 2500, 2556, 600, 850, 2338, 1857, 1857, 2000,
                2000, 2063, 2000, 2063)

    data <-   c("2015_07", "2015_07", "2015_07", "2015_07", "2015_07", "2015_07",
                "2015_07", "2015_07", "2015_08", "2015_08", "2015_08", "2015_08",
                "2015_08", "2015_08", "2015_08", "2015_08", "2015_08", "2015_08",
                "2015_09", "2015_09", "2015_09", "2015_09", "2015_09", "2015_09",
                "2015_09", "2015_09")

    dados <- data.frame(bairro, tipo, valor, data)

    x <- tapply(dados$valor, list(dados$tipo, dados$data, dados$bairro), median)

## ok, esse é o resultado final 1.

So far blz, but now, I need to homogenize, this is where my problem is!! Below is one of the functions for this:

homo <- function (a){
        a <- a[order(a$valor),] # ordenar o pvalor
        n <- nrow(a)
        a
        for(i in 1:n){
          a$sobra[i] = round(((a$valor[i+1] / a$valor[i])*100)-100, dig = 2)
        }

        a <- subset (a, a$sobra < 50)   # ponto de corte < 50
        return (a)
      }

When applying the "homo" function to tapply, it gives error.

tapply(dados$valor, list(dados$tipo, dados$data, dados$bairro), homo)

Someone could help me?

2 answers

1

With the help of @Pierre Lafortune, follows the answer:

  library(dplyr)    
  dados %>% group_by(tipo, data, bairro) %>%
            arrange(pvalor) %>%
            mutate(sobra = round(((lead(pvalor) / pvalor)*100)-100, dig = 2)) %>%
            filter(sobra < 50) %>%
            summarise(pvalor = mean(pvalor))

Browser other questions tagged r dplyr plyr

You are not signed in. Login or sign up in order to post.

by Rcoster • **1,779** points · Answer 1 · 2015-10-09T18:10:49+00:00

1

The problem is that a vector is being passed to the function homo() (dados$valor) and within it you’re treating it as a date.frame/list (trying to call a$valor, among others.)

Below a function homo() that works, but I don’t know if it’s the result you wanted (I couldn’t understand what you consider by homogenizing):

homo <- function (a){
        a <- order(a) # ordenar o pvalor
        n <- length(a)
        sobra <- rep(NA, n -1)
        for(i in 1:n){
          sobra[i] = round(((a[i+1] / a[i])*100)-100, dig = 2)
        }

        a <- subset(a, sobra < 50)   # ponto de corte < 50
        return(a)
      }

Besides the error of considering as a list, I also corrected an error that would happen in the for(i in 1:n), where you would try to call a non-existent position (n+1).

Rcoster, thank you for answering my question. Could explain better what you said: "The problem is that a vector is being passed to the homo() ($value data) function and within it you are treating it as a data.frame/list (trying to call$value, among others.)" Its function gives result different from mine. I applied its function to the tapply command and it went wrong, that’s the problem. When the function is applied to the tapply command, it is an error. How the function should be written so that together with tapply it works??

– Woldinei Meier

2015/10/11 at 23:39
1

The first parameter of tapply() is transferred to the function homo(). The problem is that you are specifying a vector (dados$valor, instead of dados), and the function is waiting for a data.frame (the first line of your function tries to sort the lines of a, in addition to calling the value column through the a$valor). And I just tested here, my function works within the tapply given in the statement - but, as I said, the result is probably not what you expected, since it has not been said what the function homo() should do - what I did was fix it so it doesn’t make a mistake.

– Rcoster

2015/10/13 at 19:04
Thanks for the answer, what can I do to resolve this?

– Woldinei Meier

2015/10/13 at 20:30
1

Make the corrections I pointed out, in addition to testing in a simpler example (maybe only with a categorical variable) - so it is easier to understand the [not-]function functioning, besides checking the resulting value.

– Rcoster

2015/10/14 at 14:22
Thanks man, I managed to make it work with dplyr function!

– Woldinei Meier

2015/10/14 at 18:50