Validating a model with cross validation (logistic regression). What is Grid search for?

Asked

Viewed 151 times

1

scores = cross_val_score(logmodel,y_test.astype(float).reshape(-1, 1),predictions.reshape(-1, 1),
                        scoring="neg_mean_squared_error",cv=10)


log_rmse_scores = np.sqrt(-scores)

def display_scores(scores):
    print("Scores: ", scores)
    print("Mean: ",scores.mean())
    print("Standard Deviation: ", scores.std())


display_scores(log_rmse_scores)

Exit:

Scores:  [0.02972702 0.02972702 0.02972702 0.02972702 0.02972702 0.02972702
 0.02972775 0.02972775 0.02972775 0.02972775]
Mean:  0.029727313479823336
Standard Deviation:  3.574977450912211e-07

From what I understand, this is still not the best model: I need to "Tunar" the model seeking the best hyperparameters with Grid Search, that’s right?

What are these hyperparameters in logistic regression? At the end of the process, what will be the model?

I don’t understand what I get in the end.

1 answer

1

In logistic regression there are no hyperparameters for join unless it is a logistical regression with regularisation.

Another thing that might be tunada in logistic regression are the variables (columns of table X) that enter the model. From what I understand from your code, table X has only 1 column, so you wouldn’t have what join in this case.

  • have 47 columns/Features. What would be the regularization?

  • 1

    Regularization is when you put in your loss function a penalty on the magnitude of the estimated weights. In general, the strength of this penalty is chosen via cross-validation. You can do what is called RFE (Recursive Feature Elimination) via cross-validation as well to find a model that uses fewer variables.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.