Sum of the regression square of R models

Asked

Viewed 893 times

4

The models presented are different in only one additional coefficient (f), which multiplies the independent variable (x), allowing the calculation of the increase of the sum of the square of the regression, including the coefficient f with a value other than zero. In this way it is possible to test the mean square associated with inclusion, with a degree of freedom.

F=(Sqmodelobc.4-Sqmodeloll.3)/QMR

The higher the value of this mean square, in relation to the mean square of the other model, the more significant is the adjustment of the model, and the hypothesis f=0 is rejected

I have the idea but I can not accomplish in an "easy and simple" way in R. Someone can help me?

#pacote curva dose resposta
library("drc")
#dose resposta "hormesis" modelo BC.4 = f(x) = 0 + \frac{d-0+fx}{1+\exp(b(\log(x)-\log(e)))}
lett.BC4 <- drm(weight ~ conc, data = lettuce, fct = BC.4())
#dose resposta "comum" modelo LL.3 = f(x) = 0 + \frac{d-0}{1+\exp(b(\log(x)-\log(e)))}
lett.LL3 <- drm(weight ~ conc, data = lettuce, fct = LL.3())  

plot(lett.BC4, col = 2, lty = 2)
plot(lett.LL3, add=TRUE) 

1 answer

2

Once, right here in Stack Overflow, I commented on variable selection (link to the publication). The variable selection problem is similar to the model selection problem: we are trying to choose the simplest model that explains our data (in statistics, we always want the simplest possible model to describe our data).

But to make a test like this that you want, with sum of squares, it is necessary that the models tested are nested. The problem is that your models are not nested. It makes no sense to do a hypothesis test like

  • H_0: the models lett.LL3 and lett.BC4 are the same

  • H_1: the models lett.LL3 and lett.BC4 are not equal

because they are not more complex and simpler versions of the same model. The nonlinear functions defined by the arguments fct = BC.4() and LL.3() are different. Therefore, from the theoretical point of view in the theory of Nonlinear Models (see Bates and Watts, Nonlinear Regression Analysis (1988), pp 103-104), the test you are trying to apply makes no sense. It can be done numerically, because it is possible to calculate the sum of squares for each of the models, but a test like this has no theoretical backing.

What can be done is to compare two nested models. For example,

lett.BC5 <- drm(weight ~ conc, data = lettuce, fct = BC.5())
lett.BC4 <- drm(weight ~ conc, data = lettuce, fct = BC.4())

The only difference between non-linear functions specified by fct = BC.5() and fct = BC.4() is that BC.5() has one more parameter:

summary(lett.BC5)

Model fitted: Brain-Cousens (hormesis) (5 parms)

Parameter estimates:

              Estimate Std. Error t-value   p-value    
b:(Intercept) 1.502065   0.352231  4.2644  0.002097 ** 
c:(Intercept) 0.280173   0.248569  1.1271  0.288836    
d:(Intercept) 0.963030   0.078186 12.3171 6.164e-07 ***
e:(Intercept) 1.120457   0.612908  1.8281  0.100799    
f:(Intercept) 0.988182   0.776136  1.2732  0.234846    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error:

 0.1149117 (9 degrees of freedom)

summary(lett.BC4)

Model fitted: Brain-Cousens (hormesis) with lower limit fixed at 0 (4 parms)

Parameter estimates:

              Estimate Std. Error t-value   p-value    
b:(Intercept) 1.282812   0.049346 25.9964 1.632e-10 ***
d:(Intercept) 0.967302   0.077123 12.5423 1.926e-07 ***
e:(Intercept) 0.847633   0.436093  1.9437   0.08059 .  
f:(Intercept) 1.620703   0.979711  1.6543   0.12908    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error:

 0.1117922 (10 degrees of freedom)

In this way, it is possible to compare the models lett.BC5 and lett.BC4 according to their sum of squares and the hypothesis test defined above:

anova(lett.BC5, lett.BC4)
1st model
 fct:      BC.4()
2nd model
 fct:      BC.5()

ANOVA table

          ModelDf     RSS Df F value p value
1st model      10 0.12498                   
2nd model       9 0.11884  1  0.4644  0.5127    

(see more information on ?anova.drc)

Since the p-value was greater than 0.05, we can say that the models are not different from each other, thus opting for lett.BC4, which is simpler.


Note that I did not answer the main question. Maybe your interest is deciding between comparing function families LL and BC and decide what is the best family of functions to adjust to your data. Unfortunately, I don’t know any statistical method like a hypothesis test to solve this problem. I give you the following two suggestions as to how to decide between LL and BC:

1) Choose the best possible model among families LL and BC, using the above methodology. With the best models of each chosen family, analyze the waste two models found and, based on residue analysis, see which model violates the least hypotheses.

2) Make a conscious choice. Check your area’s literature for models with LL (log-Logistic model) and BC (Brain-Cousens modified log-Logistic) are the most used and why. Or, since you are making a parametric adjustment of the data, say that you will use either of these two options because of their interpretability or because your data has behavior that resembles some of them. Or test some other function, like Weibull, because maybe your results will be even better.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.