Error in the application of the Boxcox function of the MASS package

Asked

Viewed 177 times

2

I’m getting the following error while using the function boxcox() package MASS:

Error in as.data.frame.default(data, optional = TRUE) : 
  cannot coerce class ‘"function"’ to a data.frame

I don’t understand how to solve it, follow my script:

df <- read.table("https://raw.githack.com/fsbmat/StackOverflow/master/teste.txt",header = TRUE)
names(df)[1:4] <- c("a","b","rep","y")
str(df)
df$a <- as.factor(df$a)
df$b <- as.factor(df$b)
# análise de variância 
m0 <- aov(y~a*b, data=df)
summary(m0)
# checagem das pressuposições
par(mfrow=c(2,2)); plot(m0); layout(1)
#
#------------------------------------------------------------------------------------------
# testes
shapiro.test(residuals(m0))
bartlett.test(residuals(m0)~interaction(df$b,df$a)) 
car::leveneTest(m0) 
car::leveneTest(m0, center="mean")
#------------------------------------------------------------------------------------------
# precisa-se de tranformação para normalidade e homocedasticidade
require(MASS)
boxcox(m0)
  • 1

    Before that, I may have another problem: > min(df$y)&#xA;[1] -0.02980628. https://stats.stackexchange.com/a/47297/205888

1 answer

3


In addition to the model adjusted to the data, it is necessary to inform the function MASS::boxcox where this data is stored:

boxcox(m0, data = df)
Error in boxcox.default(m0, data = df) : 
  response variable must be positive

But see that R gives us an error message. This is because the Box-Cox transformation is not set to zero and negative numbers, but your response variable is assuming exactly less than zero:

summary(df$y)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.02981 14.55843 21.65648 18.96480 25.39923 28.93967
sum(df$y < 0)
[1] 1

In this way, I would ask myself if this value can exist in this context. After all, there are 259 observations and only one is negative. Since I know absolutely nothing about the origin of this data, I find it strange. But it may be that this observation makes sense yes and then another transformation should be applied in the data. I suggest researching the Yeo-Johnson transformation.

  • 1

    Marcus, taking the opportunity, I started to try to answer for the exploratory analysis. I made a filter of the values for df$a to know when the observed data did not meet Wednesday. You can tell me if this would be right or should I consider all y to the Star? As a result, when a of 1:7 and 9, pvalor <= 0.05.

  • 1

    You have to consider all the y. Otherwise, it is possible to eliminate "problematic" observations from the data we analyze and turn any data set into a variable with normal distribution.

  • 1

    Thanks! thanks for the info.

  • Thanks Marcus and other colleagues! I’ll see why this negative value!

  • After removing the negative value line and using the function boxcox() in the correct way, I received a value of lambda=0.6. However, after transformation, using boxcox or Yeo-Johnson, the data continue to show escape from normality and heterocedasticity.

  • 1

    Perhaps ANOVA is not the appropriate model for this data, with transformation or not in the response variable. These data may never be able to respect the assumptions about the residues. Try using a non-parametric method.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.