Find exponent of the data-fit equation, R

Asked

Viewed 94 times

2

I would like to know how to adjust this equation/model below to my observed data (which are simple) so find the exponent p of this model

y~x^(-p)

My data is:

y=c(1.1178329,1.0871448,1.0897010,1.0759255,1.0535190,0.8725332)
x=c(6,5,4,3,2,1)

I tried the following model, but the values do not change and the iterations do not proceed.

library(minpack.lm)
mod <- nlsLM( y ~ x^(-p),
 start = c(p = 0.01) , 
trace = TRUE, lower=c(0.01) , upper=c(1))

iterations...

It.    0, RSS =  0.0671647, Par. =       0.01
It.    1, RSS =  0.0671647, Par. =       0.01

I thank everyone who can help me with this question.

3 answers

3

There is no problem with this adjustment or code. See what happens when I change the initial kick to 0.9:

y=c(1.1178329,1.0871448,1.0897010,1.0759255,1.0535190,0.8725332)
x=c(6,5,4,3,2,1)

library(minpack.lm)
mod1 <- nlsLM( y ~ x^(-p),
  start = list(p = 0.9) , 
  trace = TRUE, lower=0.1, upper=1)
It.    0, RSS =    2.99354, Par. =        0.9
It.    1, RSS =   0.246237, Par. =        0.1
It.    2, RSS =   0.246237, Par. =        0.1    

More iterations occur as expected. If I keep the initial fixed value and reduce the lower bound of the search grid to 0.00001, see what happens:

mod2 <- nlsLM( y ~ x^(-p),
              start = list(p = 0.1) , 
              trace = TRUE, lower=0.00001, upper=1)
It.    0, RSS =   0.246237, Par. =        0.1
It.    1, RSS =  0.0544138, Par. =      1e-05
It.    2, RSS =  0.0544138, Par. =      1e-05

Apparently, the ideal value of p for this data set is quite close to 0. Note that including the RSS value (Residual Sum of Squares) reduces of mod1 for mod2, indicating that the second adjustment was better than the first.

In short, there’s nothing wrong with the code. It’s just converging too fast.

It may also be that the value of p is not positive. It may be that the value of p which best fits the data does not belong to the range determined by your initial fit, which is between 0.1 and 1. If this is true, the value of p will converge to something at the boundary of the interval defined in the function call. You came to think of this hypothesis?

See the code below, for example:

mod3 <- nlsLM( y ~ x^(-p),
              start = list(p = 0.1) , 
              trace = TRUE, lower=-1, upper=1)
It.    0, RSS =   0.246237, Par. =        0.1
It.    1, RSS =  0.0218977, Par. = -0.0816128
It.    2, RSS =  0.0166599, Par. = -0.0607616
It.    3, RSS =  0.0166587, Par. = -0.0604426
It.    4, RSS =  0.0166587, Par. = -0.0604428

mod3 resulted in the lowest RSS of all. Even, it has more expensive of adjustment, because it took longer to arrive in a response, with more interactions and a smoother convergence in the value of the parameter.

1

Either I’m completely wrong or lm enough to determine p.

y ~ x -p <=> y ~ e -plog(x) <=> log(y) ~ -plog(x)

Then we adjust this last linear model.

fit <- lm(log(y) ~ 0 + log(x))
coef(fit)

p <- -coef(fit)
p
     log(x) 
-0.06054118

plot(x, y, log = "xy")
abline(fit)
  • You have to pull the regression intercept: fit <- lm(log(y) ~ 0 + log(x)), then I think it is equivalent. I think you have to be careful with the hypothesis tests as well. That p-value is not valid when you do it this way.

  • @Danielfalbel Certo regarding the interception of regression, sorry for the rush. I will edit the answer with this modification. As for p-value, I do not agree. The only thing that matters is whether or not the coefficient is significant, the p-value can and should vary.

  • Yes, but it assumes that log(y) has normal distribution that may not be the goal!

  • @Danielfalbel In the linear model, what is assumed to have normal distribution are the residues, not the answer. Of course it will be necessary to see the waste, with, for example, a histogram.

  • is equivalent to the response or the residuals... if y = a + bx + e, with and normal(0, sigma) then y ~normal(a + bx, sigma)...

  • @Danielfalbel Graphically I see no problem with the linear model. There is an outlier and nothing else. I will edit the answer again, with the graphical instructions.

Show 1 more comment

1

I would solve using the function optim which serves to do arbitrary optimizations, date a loss function with respect to some parameters.

Here is an example:

y=c(1.1178329,1.0871448,1.0897010,1.0759255,1.0535190,0.8725332)
x=c(6,5,4,3,2,1)

opt <- optim(runif(1), function(p){
  sum((y - x^(-p))^2)
}, method = "L-BFGS-B")

p <- opt$par

The function optim received three parameters:

  • The initial value of p
  • The loss function. In this case we are minimizing the sum of the squares of the residues of this model.
  • The method of optimization.

The final value was -0.06044205. You can make a curve graph adjusted like this:

library(dplyr)
library(ggplot2)
data_frame(x = seq(1, 10, length.out = 100), y = x^(-p)) %>%
  ggplot(aes(x = x, y = y)) + 
  geom_line(colour = "blue")

inserir a descrição da imagem aqui

The advantage of this approach is that you can use different loss functions and relationships between variables. Also easy to include regularisations etc.

  • Personally, I also prefer the optim. I find it much more general, serving virtually any case we wish. I just find it less friendly to those who don’t have a stronger statistical or mathematical background, but I get your point.

  • 1

    @Marcusnunes Yes, it is more complicated even. Mainly to make inference... The way I did, there’s no confidence interval or hypothesis test implemented! I don’t know the minpack.lm, but I believe he must have that kind of thing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.