How to select all data.frame variables at once for a regression?

Asked

Viewed 530 times

5

Suppose the following data.frame:

set.seed(1)    
dados <- data.frame(y=rnorm(100), x1=rnorm(100), x2=rnorm(100), x3=rnorm(100), x4=rnorm(100))

If I want to run a y regression against x1...xn, I can do it this way:

modelo <- lm(y~x1+x2+x3+x4, data=dados)

In this case since it only has 4 variables, it is not exhaustive to describe all. But assuming that they were 100 variables, that is, of x1 until x100. How to select all in an easy way for regression?

3 answers

5


The operator . in this context (argument formula of function lm) means "all other columns that are not in the formula".

Thus the y regression against all other columns of the data.frame can be obtained as follows::

modelo <- lm(y~., data=dados)

Reference: ?formula

2

The point is particularly useful when you want to put interaction effects. For example, suppose you want to test a model with all variables and all interactions of up to 2 variables, as could be done?

## Conjunto de dados de exemplo
exemplo = data.frame(x1 = 1:3, x2 = 1:3, x3 = 1:3, x4 = 1:3)

## Modelos com todas interações até 2
lm(data = x, formula = x1 ~ (.)^2)

## Modelos com todas as interações até 3
lm(data = x, formula = x1 ~ (.)^3)

1

or, if dados is your frame and the first column has name y (as is your case),

    modelo <- lm(formula=dados)

also works.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.