Function tapply - arguments must have the same length

Asked

Viewed 1,223 times

1

Hello, good night!

I have a data frame with thousands of rows and 58 columns containing, for example, supplier, material, quantity of material and total value of material. I made an example below, just what I need at first.

Fornecedor  Material    Qtde    Valor_Total
A   A   1   100
A   B   2   150
A   E   5   26
B   B   6   76
C   A   5   126
C   C   1   58
D   D   10  108
E   E   9   99
E   A   7   30
E   E   8   80
E   E   1   54
F   G   1   0

First, I created a column with the average value of each row

dados$valor_medio <- round(dados$Valor_Total/dados$Qtde,2)

Now I need to calculate the average, median and a new average, apart from the outliers, of dados$valor_medio by material. However, when applying the function tapply the following error occurs:

dados<-tapply(dados$valor_medio, dados$Material, mean, na.rm = TRUE)

Error in tapply($value_medio data, $Material data, Mean, na.rm = TRUE) : arguments must have the same length

Could someone help me with this error and report how I calculate the average taking the outliers of dados$valor_medio of each material?

PS: The material is chr

  • 1

    Probably your column given$Material is a list or something. What is the result of str(dados$Material) or class(dados$Material)?

  • with this example of yours, I managed to run without any error

  • > class(base$Material) [1] "Character"

  • In fact, the Material has the following format: 00.000.000. I was unhappy in the example, sorry.

1 answer

2

When you make that mistake, most of the time it’s because you have to use ?ave and not tapply.

dados$valor_medio <- round(dados$Valor_Total/dados$Qtde,2)

dados$media <- ave(dados$valor_medio, dados$Material, FUN = mean)
dados$mediana <- ave(dados$valor_medio, dados$Material, FUN = median)

As for the other average, no outliers, it depends on the definition of outliers. Its definition is the one used by the function boxplot.stats, so I’ll call this function to calculate the other average.

media_sem_out <- function(x){
    s <- boxplot.stats(x)$stats
    x <- x[s[1] <= x & x <= s[5]]
    mean(x)
}

dados$media2 <- ave(dados$valor_medio, dados$Material, FUN = media_sem_out)
  • I will consider the average without extremes as the average of the values between the Lower Limit and the Upper Limit of the Box-Plot, that is, the values between (first quartile - 1.5 * (third quartile - first quartile)) and (third quartile + 1.5 * (third quartile - first quartile))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.