Selecting logistic regression variables

Asked

Viewed 159 times

3

I’m new to RStudio and I’m having trouble simplifying my code. I’m doing a logistic regression and would like to select only variables with p<0.20. For that, I did the following:

imcbi <- glm(data2$desfechouti ~ data2$imc, family=binomial())

localbi <- glm(data2$desfechouti ~ as.factor (data2$local), 
family=binomial())

readmbi <- glm(data2$desfechouti ~ as.factor (data2$readm), 
family=binomial())

I spin separately and do it later:

summary(imcbi)
summary(localbi)
summary(readmi)

There is how I assemble the code so that the bivariate analysis is all done together so that I need only after ONE command summary?

Remembering that I am selecting variables to assemble my logistic regression model.

  • Using LASSO Logistic Regression to select variables may be an alternative.

1 answer

1

Try adjusting a logistic regression model with all predictive variables simultaneously:

modelo <- glm(desfechouti ~ imc + as.factor(local) + as.factor(readm),
  data = data2, family = binomial())
summary(modelo)

When using the syntax glm(formula, data = data2) no need to put the name of the data frame before each call variable in the setting.

  • I get it, Marcus. The problem is that I run the risk of including variables that are not relevant even in bivariate analysis. In my table I count with more than 30 factors, so I wanted a simple way to write this code. I’ll keep trying. I fought for the suggestion.

  • Don’t take this the wrong way, but the information "In my table I count with more than 30 factors" was not in the original question. There are methods capable of handling automatic variable selection even with a high number of variables. To do as it was proposed, with more than 30 independent regressions, as well as laborious, is wrong, because it inflates the rate of false positives. Be more detailed in the question so that you get the help you need to solve your specific problem.

  • I apologize for the way I expressed myself. What I meant was that I didn’t get to the GLM model yet, but I was trying to select the variables. I know there are methods to assemble logistic regression (stepwise forward, backward, etc.) but even so you need to select the variables whose bivariate analysis presents a p<0.2 (or whatever your cut). At least that’s how most medical articles do it. I follow in search of a more user friendly code (the current one is rotating around 1000 lines). Thanks for the help.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.