I have a table with some columns of factors that vary over time. With multiple regression I can evaluate the influence of a group of factors on the variation of 1. How can I do this in R?

    The question is not clear, you just want to know how to do a regression in R?

  • Yes, I would like to know how to do multiple regression on R and if possible, linear too.

  • Do you have any code you have ever tried to develop or are trying? Paste it in the question, will improve the understanding of the staff and will be able to help better.

  • You just follow the step-by-step answer below Carlos Cinelli. I followed the steps and managed to make the calculations perfectly and I got the data I needed.

  • We don’t do your work for nothing . What have you tried?

1 answer


You can run a regression on R using the function lm. Using the base mtcars that already comes in R as an example:

regressao <- lm(mpg ~ cyl, data = mtcars)

First we move to the function lm regression formula mpg ~ cyl and then the database data = mtcars. The formula mpg ~ cyl means that we are regressing the variable mpg (miles per gallon) against the variable cyl (engine capacity), would be equivalent to the equation mph = B0 + B1*cyl + e, and you are estimating the parameters B0 (constant) and B1 (angular coefficient). The regression result was saved in the object regressao.

In giving summary you see the main regression results:


lm(formula = mpg ~ cyl, data = mtcars)

    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

To do multiple regression, just include more variables after the ~. More specifically, the element to the left of the ~ is the dependent variable (its y) and all variables to the right of the ~ are explanatory variables (os X). For example:

regressao_multipla <- lm(mpg ~ cyl + disp + wt + hp , data = mtcars)

Here we run a regression with 4 explanatory variables: cyl, disp, wt and hp, all in the data.frame mtcars. To see the main results, use summary again:

lm(formula = mpg ~ cyl + disp + wt + hp, data = mtcars)

    Min      1Q  Median      3Q     Max 
-4.0562 -1.4636 -0.4281  1.2854  5.8269 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 40.82854    2.75747  14.807 1.76e-14 ***
cyl         -1.29332    0.65588  -1.972 0.058947 .  
disp         0.01160    0.01173   0.989 0.331386    
wt          -3.85390    1.01547  -3.795 0.000759 ***
hp          -0.02054    0.01215  -1.691 0.102379    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.513 on 27 degrees of freedom
Multiple R-squared:  0.8486,    Adjusted R-squared:  0.8262 
F-statistic: 37.84 on 4 and 27 DF,  p-value: 1.061e-10

There are several other functions to work with regressions in R. The object that function lm returns is from class lm, for you have an idea of the methods available for the class you can run methods(class = "lm").

  • Wow, thank you so much for the explanation! It helped me a lot!

  • I tested the function with a database I am working and worked very well.

