How to do k-fold validation when I change the cut-off of the model?

Asked

Viewed 52 times

1

When I report metrics from a machine-Learning model I always use k-fold validation. Here’s an example of implementation:

import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold

data = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")

from sklearn.model_selection import train_test_split

X=data.iloc[:,1:]
y=data['admit']

from sklearn.neural_network import MLPClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

number_nb=1
mlp = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2))
mlp.fit(X_train, y_train)
pred = mlp.predict(X_test)

With the model ready, I define a function to evaluate the model:

def evaluate_model(X, y, model, metric):
    cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
    scores = cross_val_score(model, X, y, scoring=metric, cv=cv, n_jobs=-1)
    return np.mean(scores)

Assessing the accuracy of:

evaluate_model(X_train, y_train, mlp,'accuracy')

Returns: 0.6829 ...

It turns out that I would like to change the Threshold of acceptance of the model in order to choose the combination of sensitivity and specificity best suited to my specific case. Adopting the procedure described in this question I can generate the new prediction vector, but with it it is not possible to use the function cross_val_score.

There is a way to calculate metrics by k-fold validation when changing the cut-off of the model?

1 answer

2


To link a different threshold to MLPClassifier keeping the optimization interface for cross-validation, you can encapsulate it in a class it inherits from BaseEstimator and ClassifierMixin:

from sklearn.base import BaseEstimator, ClassifierMixin

class CustomCutoffClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, model, cutoff):
        self.model = model
        self.cutoff = cutoff
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def predict(self, X):
        return self.model.predict_proba(X)[:, 1] > self.cutoff

mlp = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2))
model = CustomCutoffClassifier(mlp, 0.3)
evaluate_model(X_train, y_train, model, 'accuracy')

Alternatively, you can leave the "hardcoded" cutoff in a direct Mlpclassifier inheritance:

class MyMLPClassifier(MLPClassifier):
    def predict(self, X):
        return self.predict_proba(X)[:, 1] > 0.3

mlp = MyMLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2))
evaluate_model(X_train, y_train, mlp, 'accuracy')
  • Thank you. Truly creative solution of creating a new class. Unfortunately, however, the function cross_val_score is returning an array of nan now

  • 1

    Apparently sklearn chia with private attribute notation (_attr). I put an alternative implementation in case the first one doesn’t work.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.