For Loop in R - Linear Regression

Asked

Viewed 194 times

2

Hi, how are you? I am a beginner, and I would like to ask some questions about the possibility of automatic linear regressions. I have a database with 520 observations (rows) and 67 variables (columns). I would like to make regressions, such as column 1 with column 2, column 1 with column 3 and so on. Also save, in a new database, the residues of each regression. Currently, I do it manually in R, which tends to cause a delay in the whole process.

I still have no knowledge in for loop or type functions. If someone could shed some light, it would help a lot.


reg <- lm(ativos$BBAS3~ativos$ITSA4)
residuos <- reg$residuals
summary(reg)

#Aqui é análise de estacionáriedade dos resíduos
estacionariedade_u <- ur.df(residuos, type = "none", selectlags = "AIC")
summary(estacionariedade_u) 

1 answer

4

Running multiple regressions is not as difficult as that. The biggest problem for those who are starting to learn R is in the functions *apply that are cycles for disguised. They greatly simplify things when you are comfortable with them.

First I will create a data set, with 520 observations (rows) and 7 variables (columns). The answer is the first column and the regressive variables are the following columns.

set.seed(1234)    # Torna os resultados reprodutíveis

m <- 520
n <- 6
BBAS3 <- 1:m + rnorm(m, 0, 0.1)
ativos <- data.frame(BBAS3)
regr <- matrix(BBAS3 + rnorm(m*n), nrow = m)
colnames(regr) <- paste0("ITSA", 1:n)
ativos <- cbind(ativos, regr)

Now the problem code.
The function lapply applies the same function to all elements of the first argument. In this case I will apply an unnamed function, defined ad-hoc, therefore a function said anonymous.

library(urca)

model_list <- lapply(ativos[-1], function(x) 
  lm(BBAS3 ~ x, data = ativos))

That’s it. All the regressions have already run, with the 6 columns in question being one at a time passed in the argument x. That’s why I removed the first column, ativos[-1], to be alone with the regressors.

See the result of the first regression.

model_list[[1]]    # Uma maneira
model_list$ITSA1   # Equivalente
#
#Call:
#lm(formula = BBAS3 ~ x, data = ativos)
#
#Coefficients:
#(Intercept)            x  
#    0.09342      0.99991  

Now we can use this list of models to extract whatever we want, always with lapply.

resid_list <- lapply(model_list, residuals)
lapply(resid_list, summary)

estacionariedade_list <- lapply(resid_list, ur.df, type = "none", selectlags = "AIC")
estac_smry <- lapply(estacionariedade_list, summary)

This last list, estac_smry, has in each element values of interest.

estac_pval <- lapply(estac_smry, function(x)
  x@testreg$coefficients[, 4])

estac_r.squared <- sapply(estac_smry, function(x)
  x@testreg$r.squared)

estac_adj.r.squared <- sapply(estac_smry, function(x)
  x@testreg$adj.r.squared)

The last two instructions use sapply to obtain vectors as a way out, the lapply always gives lists.

  • 1

    Rui Barradas, thank you so much for your help, I’m extremely grateful. It’s really simplified a lot. I’m going to focus more on the apply family.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.