There are several ways to run multiple regressions per category in R. I’ll show you how to do with the base functions of R and with the dplyr. As an example, let’s use the database mtcars.
Suppose you want to run the regression mpg ~ disp + hp for each level of the variable cyl of mtcars (are 3 categories).
First, you can use the function split() to build a list of three data.frames different, one for each category:
data.frame_por_categoria <- split(mtcars, mtcars$cyl)
Now, just use it lapply() to apply the regression in each data.frame:
modelos <- lapply(data.frame_por_categoria, function(x) lm(mpg ~ disp + hp, data = x))
The result, modelos is a list of the three regressions. To access the first template:
modelos[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = x)
Coefficients:
(Intercept) disp hp
43.04006 -0.11954 -0.04609
It is also possible to do the same thing with the package dplyr.
You have group by category and then use function do() to turn the regression by placing a point . where the data.frame would need to enter:
library(dplyr)
resultado <- mtcars %>% group_by(cyl) %>% do(modelo = lm(mpg ~ disp + hp, data = .))
The resultado of the operation is a data.frame with a column called model, and each element of this column is regression. To access the first model:
resultado$modelo[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = .)
Coefficients:
(Intercept) disp hp
43.04006 -0.11954 -0.04609
The categories of
XsaneX1,X2,X3andX4or1,2,3and4?– Marcus Nunes
Categories are X1, X2, X3 and X4
– Naomi