There are several ways to run multiple regressions per category in R. I’ll show you how to do with the base functions of R and with the dplyr
. As an example, let’s use the database mtcars
.
Suppose you want to run the regression mpg ~ disp + hp
for each level of the variable cyl
of mtcars
(are 3 categories).
First, you can use the function split()
to build a list of three data.frames
different, one for each category:
data.frame_por_categoria <- split(mtcars, mtcars$cyl)
Now, just use it lapply()
to apply the regression in each data.frame
:
modelos <- lapply(data.frame_por_categoria, function(x) lm(mpg ~ disp + hp, data = x))
The result, modelos
is a list of the three regressions. To access the first template:
modelos[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = x)
Coefficients:
(Intercept) disp hp
43.04006 -0.11954 -0.04609
It is also possible to do the same thing with the package dplyr
.
You have group by category and then use function do()
to turn the regression by placing a point .
where the data.frame
would need to enter:
library(dplyr)
resultado <- mtcars %>% group_by(cyl) %>% do(modelo = lm(mpg ~ disp + hp, data = .))
The resultado
of the operation is a data.frame
with a column called model, and each element of this column is regression. To access the first model:
resultado$modelo[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = .)
Coefficients:
(Intercept) disp hp
43.04006 -0.11954 -0.04609
The categories of
X
saneX1
,X2
,X3
andX4
or1
,2
,3
and4
?– Marcus Nunes
Categories are X1, X2, X3 and X4
– Naomi