Gradient vector for use in the Optim function

Asked

Viewed 155 times

3

I have searched a lot, in several forums and in many pages about parameters estimation via function optim of R and found nothing on how to add the gradient function in a script, so that the derivatives are calculated via some function of R, _deriv()_ or _deriv3()_ or via package _NumDeriv_ or package _Deriv_. I need to calculate the gradient in order to try to improve my estimation process and it is impossible to do the derivatives by hand because my likelihood log function is very extensive.

I can do the derivatives on _Maple 16_, but would like to know how to obtain the gradient in the _R_ in a more automated way. The code below is a script for estimating Birnbaum-Saunders univariate parameters with two parameters, in case I manage to implement the gradient in this script I could implement in my most complicated code!

When compiling the code below I have sometimes obtained negative parameters, which is absurd, because _alpha_, _beta_ and _t_ are positive. I imagine that with the gradient this may no longer occur.

library(VGAM)
alpha<-2
beta <-1
truevalue <- c(alpha,beta)
n=1000
N=300
m=matrix(ncol=2,nrow=N)

for (i in 1:N){
x <- rnorm(n,mean = 0,sd=sqrt((alpha^2)/4))
t <- beta*(1+2*x^2+2*x*sqrt(1+x^2)) #t possui distribuição birnbaum-Saunders com parâmetros alpha e beta
#t <- rbisa(n, alpha, beta)
#sum(1*(t<0))

#Função Densidade da distribuição birnbaum-saunders

f <-function(theta){
  alpha <- abs(theta[1])
  beta  <- abs(theta[2])
  d <- (1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-    (1/(2*alpha^2))*((t/beta)+(beta/t)-2))
  return(d)
}

#Forma 1 da log verossimilhança

# log.ver <- function(theta){
#   alpha <- abs(theta[1])
#   beta  <- abs(theta[2])
#   l <- sum(log(f(theta)))
#   return(l)
# }

#Forma 2 da log verossimilhança

log.ver <- function(theta){
  alpha <- abs(theta[1])
  beta  <- abs(theta[2])
  l <- sum(log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+    (beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2))))
  return(l)
}

alpha_0 <- 3
beta_0 <- 4
start <- c(alpha_0,beta_0)
opt <- optim(start,log.ver,method="BFGS",hessian = F,control=list(fnscale=-1))
m[i,]=opt$par
}

#Calculating the average of each column of the array of parameters m
mest=colMeans(m)

#calculating the standard deviation of each column of the array of     parameters m
dest=apply(m,2,sd)

#root mean square error in the calculation of each column of the array of     parameters m in relation to the true value of the parameter
eqm=function(x,opt){ 
  N=length(x)
  sqrt(sum(((x-opt)^2))/N)}

#Estimated mean squared error of each parameter 
eqmest=c(eqm(x=m[,1],opt=alpha),
         eqm(x=m[,2],opt=beta))

# Table with the true values of the parameters and the average
# Standard deviation and mean square error of the estimated parameters
tab=data.frame(truevalue,mean=mest,sd=dest,eqm=eqmest)
tab

I really appreciate the help!

2 answers

2

I worked on the original code of your answer and identified two problems:

  1. By default, the function Optim performs minimizing, which can be converted into a maximizing if you change the signal of the objective function. Therefore, the point that your optimization is returning is a minimum point. Anyway, I see you solved that in the second answer.

  2. If your maximum likelihood estimator needs to meet conditions, you will have to use an algorithm that does the optimization taking into account these constraints. Providing a gradient will not solve this problem directly, it will only make your optimization faster and more accurate. This can eventually cause you to find great points only in the region that interests you, but there are problems where unrestricted optimization even with the gradient will lead to absurd results. In fact, I think it’s a good strategy not to implement gradients until you experience problems of convergence or inefficiency in the code. In your case, I don’t think that’s happening yet.

That said, my solution to your problem uses the function constriction package Stats, that solves minimization problems using the same algorithms of Optim, but ensures that the optimization will be done with restrictions.

The code below illustrates the use of the function constriction in your problem, where I make alpha and beta positive through the parameters ui and ci.

library(VGAM)
alpha<-2
beta <-1
truevalue <- c(alpha,beta)
n=1000
N=300
m=matrix(ncol=2,nrow=N)

for (i in 1:N){
  x <- rnorm(n,mean = 0,sd=sqrt((alpha^2)/4))
  t <- beta*(1+2*x^2+2*x*sqrt(1+x^2)) #t possui distribuição birnbaum-Saunders com parâmetros alpha e beta
  #t <- rbisa(n, alpha, beta)
  #sum(1*(t<0))

  #Função Densidade da distribuição birnbaum-saunders

  f <-function(theta){
    alpha <- abs(theta[1])
    beta  <- abs(theta[2])
    d <- (1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-    (1/(2*alpha^2))*((t/beta)+(beta/t)-2))
    return(d)
  }

  #Forma 1 da log verossimilhança

  # log.ver <- function(theta){
  #   alpha <- abs(theta[1])
  #   beta  <- abs(theta[2])
  #   l <- sum(log(f(theta)))
  #   return(l)
  # }

  #Forma 2 da log verossimilhança

  log.ver <- function(theta){
    alpha <- abs(theta[1])
    beta  <- abs(theta[2])
    l <- -sum(log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2))))
    return(l)
  }

  alpha_0 <- 2
  beta_0 <- 3
  start <- c(alpha_0,beta_0)
  opt <- constrOptim(start, log.ver, grad = NULL, ui = diag(2), ci = rep(0,2), method = 'Nelder-Mead')

  m[i,]=opt$par
}

#Calculating the average of each column of the array of parameters m
mest=colMeans(m)

#calculating the standard deviation of each column of the array of     parameters m
dest=apply(m,2,sd)

#root mean square error in the calculation of each column of the array of     parameters m in relation to the true value of the parameter
eqm=function(x,opt){ 
  N=length(x)
  sqrt(sum(((x-opt)^2))/N)}

#Estimated mean squared error of each parameter 
eqmest=c(eqm(x=m[,1],opt=alpha),
         eqm(x=m[,2],opt=beta))

# Table with the true values of the parameters and the average
# Standard deviation and mean square error of the estimated parameters
tab=data.frame(truevalue,mean=mest,sd=dest,eqm=eqmest)
tab
  • Thank you @Fernandocorrêa, regarding your answer 1, note that in my code I add in function Optim the term control=list(fnscale=-1), That means I’m doing a maximization. Regarding your answer 2, I fully agree, however, as you can notice, after some corrections I posted in my reply the estimation process even without the gradient is not bad. This is not my problem, because this code is a test, I really need to adjust the gradient as I explained in my question to solve the estimation in other models. But I appreciate your help!

1


I got a solution for the gradient and I added Hessian, with the three lines of code that are commented in the middle of the script it is possible to verify the coherence of the code!

    rm(list=ls())
cat("\014")
library(VGAM)
library(Deriv)
n=100
N=300
m=matrix(ncol=2,nrow=N)
f <- quote((log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2)))))
fd1 <- Deriv(f,"alpha")
fd2 <- Deriv(f,"beta")

hd11 <- Deriv(fd1,"alpha")
hd12 <- Deriv(fd1,"beta")
hd21 <- Deriv(fd2,"alpha")
hd22 <- Deriv(fd2,"beta")

for (i in 1:N){
  alpha<-2
  beta <-1
x <- rnorm(n,mean = 0,sd=sqrt((alpha^2)/4))
t <- beta*(1+2*x^2+2*x*sqrt(1+x^2)) #t possui distribuição birnbaum-Saunders com parâmetros alpha e beta

rm(alpha,beta)
log.ver <- function(theta){
  alpha <- theta[1]
  beta  <- theta[2]
  l <- sum(log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2))))
  return(-l)
}

grad<-function(theta){
  alpha <- theta[1]
  beta  <- theta[2]
  gd <- cbind(eval(fd1),eval(fd2))
  return(-colSums(gd, na.rm = FALSE))
}

Hess<-function(theta){
  alpha <- theta[1]
  beta  <- theta[2]
  matrix(c(-sum(eval(hd11)),-sum(eval(hd12)),
         -sum(eval(hd21)),-sum(eval(hd22))),nrow = 2,ncol = 2)
}

# theta <- c(opt$par[1],opt$par[2])
# Hess(theta)
# opt$hessian

alpha_0 <- 10
beta_0 <- 10
start <- c(alpha_0,beta_0)
opt <- optim(start,log.ver,method="BFGS",gr=grad,hessian = T)
#opt <- optim(start,log.ver,method="BFGS",hessian = F)

m[i,]=opt$par
}
alpha<-2
beta <-1
#Calculating the average of each column of the array of parameters m
mest=colMeans(m)

#calculating the standard deviation of each column of the array of parameters m
dest=apply(m,2,sd)

#root mean square error in the calculation of each column of the array of parameters m in relation to the true value of the parameter
eqm=function(x,opt){ 
  N=length(x)
  sqrt(sum(((x-opt)^2))/N)}

#Estimated mean squared error of each parameter 
eqmest=c(eqm(x=m[,1],opt=alpha),
         eqm(x=m[,2],opt=beta))

# Table with the true values of the parameters and the average
# Standard deviation and mean square error of the estimated parameters
#m
truevalue <- c(alpha,beta)
tab=data.frame(truevalue,mean=mest,sd=dest,eqm=eqmest)
tab

Browser other questions tagged

You are not signed in. Login or sign up in order to post.