Bootstrap in linear regression model - Calculating the importance of variables

Question

Bootstrap in linear regression model - Calculating the importance of variables

Asked 5 years, 9 months ago

Viewed 101 times

3

I’m calculating the importance of variables for multiple regression with function varImp package caret. But when doing hair using function and bootstrap I cannot recover the values as I got for R².

How can I save the importance values of coefficients in . csv, for example?

Replicable example:

library(boot)
library(caret)

imp_lm <- function(data, indices) {
  d <- data[indices,] 

  fit.all <- lm(d$mpg~.,data=d)
  return(varImp(fit.all, scale = FALSE))
}

results <- boot(data=mtcars, statistic=imp_lm, R=10)

Error in t.star[r, ] <- res[[r]] : incorrect number of subscripts on Matrix

1 answer

Browser other questions tagged r bootstrap-statistics caret

You are not signed in. Login or sign up in order to post.

by Marcus Nunes • **17,915** points · Answer 1 · 2019-11-03T22:29:03+00:00

Note that in function help boot, the argument statistic has the following description (emphasis added):

A Function which when Applied to data Returns a vector containing the statistic(s) of interest.

As we rotate the function varImp, we get the following:

fit.all <- lm(mpg ~ ., data=mtcars)
resultado <- varImp(fit.all, scale = FALSE)
is.data.frame(resultado)
## [1] TRUE

So the result of your function imp_lm returns a data frame, because the function varImp returns a data frame. One way around this is to change your function by placing return(varImp(fit.all, scale = FALSE)[, 1]) at the end and thus extracting the first column of the result that calculates the importance of the variables:

imp_lm <- function(data, indices) {
  d <- data[indices, ] 

  fit.all <- lm(mpg ~ ., data=d)
  return(varImp(fit.all, scale = FALSE)[, 1])
}

results <- boot(data=mtcars, statistic=imp_lm, R=10)
results

## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = mtcars, statistic = imp_lm, R = 10)
## 
## 
## Bootstrap Statistics :
##       original     bias    std. error
## t1*  0.1066392  0.8799646   0.7188905
## t2*  0.7467585 -0.1206878   0.4692999
## t3*  0.9868407 -0.1654184   0.5865589
## t4*  0.4813036  0.5936594   0.7967201
## t5*  1.9611887 -0.8193792   0.5743548
## t6*  1.1234133  0.2350501   0.7057048
## t7*  0.1509915  0.9933979   0.8952965
## t8*  1.2254035  0.5388327   1.3083746
## t9*  0.4389142  0.6839702   0.8997836
## t10* 0.2406258  0.9177274   1.3346145