How to stratify/divide a data.frame into categories of a variable in R?

Asked

Viewed 1,431 times

1

I am running a linear regression model on R and would like to perform stratified analysis according to categories of a variável X with 4 categories (X1, X2, X3 and X4).

I thought I’d stratify the data.frame by the categories of X, so I would have 4 data.frames and would run the same model for each category.

I tried the function:

X1=data.frame[which(data.frame$X==1), ]

but resulted in a data.frame X1 with 0 remarks (rows), although the column name appears.

What do you suggest to correct this error? Thank you.

  • The categories of X sane X1, X2, X3 and X4 or 1, 2, 3 and 4?

  • Categories are X1, X2, X3 and X4

1 answer

2


There are several ways to run multiple regressions per category in R. I’ll show you how to do with the base functions of R and with the dplyr. As an example, let’s use the database mtcars.

Suppose you want to run the regression mpg ~ disp + hp for each level of the variable cyl of mtcars (are 3 categories).

First, you can use the function split() to build a list of three data.frames different, one for each category:

data.frame_por_categoria <- split(mtcars, mtcars$cyl)

Now, just use it lapply() to apply the regression in each data.frame:

modelos <- lapply(data.frame_por_categoria, function(x) lm(mpg ~ disp + hp, data = x))

The result, modelos is a list of the three regressions. To access the first template:

modelos[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = x)

Coefficients:
(Intercept)         disp           hp  
   43.04006     -0.11954     -0.04609  

It is also possible to do the same thing with the package dplyr.

You have group by category and then use function do() to turn the regression by placing a point . where the data.frame would need to enter:

library(dplyr)
resultado <- mtcars %>% group_by(cyl) %>% do(modelo = lm(mpg ~ disp + hp, data = .))

The resultado of the operation is a data.frame with a column called model, and each element of this column is regression. To access the first model:

resultado$modelo[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = .)

Coefficients:
(Intercept)         disp           hp  
   43.04006     -0.11954     -0.04609  
  • Thank you so much, Carlos! It worked great!

  • For nothing @Naomi, if you think the answer answered, you can also accept it! Abs

Browser other questions tagged

You are not signed in. Login or sign up in order to post.