4
I need to separate the data into groups and perform the calculations in two or three groups/dimensions.
I found the tapply function, it solves the problem. With it I get what I need by using the average function, sum, etc.
But now, I realized that I need to homogenise the data in the selected groups, so instead of the function of being average, sum and so on, I need to create a function that homogeinize and then apply to tapply. I think my homogenisation function is in trouble, but I can’t identify what.
I tried with dplyr, data.table, agreggate following the idea of the link on the side, but all give error. How to consolidate (aggregate or group) values in a database?
Below follows the code I have:
bairro <- c("B_FLORESTA", "B_PINHEIRAO", "B_PINHEIRAO", "B_PINHEIRINHO",
"B_LUTHER KING", "B_LUTHER KING", "B_VILA NOVA", "B_VILA NOVA",
"B_NOVA PETROPOLIS", "B_VILA NOVA", "B_INTERIOR", "B_ALVORADA",
"B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA",
"B_SADIA", "B_JUPTER", "B_JUPTER", "B_FLORESTA", "B_ITALIA",
"B_ITALIA", "B_ITALIA", "B_ITALIA")
tipo <- c("CASA", "CASA", "COMERCIAIS", "CASA", "CASA", "COMERCIAIS",
"APARTAMENTO", "APARTAMENTO", "APARTAMENTO", "APARTAMENTO",
"SITIO", "APARTAMENTO", "CASA", "CASA", "CASA", "CASA",
"TERRENO", "TERRENO", "CASA", "CASA", "CASA", "CASA",
"CASA", "CASA", "CASA", "CASA")
valor <- c(1167, 2500, 1125, 2286, 400, 400, 1500, 1500, 300, 1500, 555,
973, 2500, 2556, 2500, 2556, 600, 850, 2338, 1857, 1857, 2000,
2000, 2063, 2000, 2063)
data <- c("2015_07", "2015_07", "2015_07", "2015_07", "2015_07", "2015_07",
"2015_07", "2015_07", "2015_08", "2015_08", "2015_08", "2015_08",
"2015_08", "2015_08", "2015_08", "2015_08", "2015_08", "2015_08",
"2015_09", "2015_09", "2015_09", "2015_09", "2015_09", "2015_09",
"2015_09", "2015_09")
dados <- data.frame(bairro, tipo, valor, data)
x <- tapply(dados$valor, list(dados$tipo, dados$data, dados$bairro), median)
## ok, esse é o resultado final 1.
So far blz, but now, I need to homogenize, this is where my problem is!! Below is one of the functions for this:
homo <- function (a){
a <- a[order(a$valor),] # ordenar o pvalor
n <- nrow(a)
a
for(i in 1:n){
a$sobra[i] = round(((a$valor[i+1] / a$valor[i])*100)-100, dig = 2)
}
a <- subset (a, a$sobra < 50) # ponto de corte < 50
return (a)
}
When applying the "homo" function to tapply, it gives error.
tapply(dados$valor, list(dados$tipo, dados$data, dados$bairro), homo)
Someone could help me?
Rcoster, thank you for answering my question. Could explain better what you said: "The problem is that a vector is being passed to the homo() ($value data) function and within it you are treating it as a data.frame/list (trying to call$value, among others.)" Its function gives result different from mine. I applied its function to the tapply command and it went wrong, that’s the problem. When the function is applied to the tapply command, it is an error. How the function should be written so that together with tapply it works??
– Woldinei Meier
The first parameter of
tapply()
is transferred to the functionhomo()
. The problem is that you are specifying a vector (dados$valor
, instead ofdados
), and the function is waiting for a data.frame (the first line of your function tries to sort the lines ofa
, in addition to calling the value column through thea$valor
). And I just tested here, my function works within the tapply given in the statement - but, as I said, the result is probably not what you expected, since it has not been said what the functionhomo()
should do - what I did was fix it so it doesn’t make a mistake.– Rcoster
Thanks for the answer, what can I do to resolve this?
– Woldinei Meier
Make the corrections I pointed out, in addition to testing in a simpler example (maybe only with a categorical variable) - so it is easier to understand the [not-]function functioning, besides checking the resulting value.
– Rcoster
Thanks man, I managed to make it work with dplyr function!
– Woldinei Meier