1
When I report metrics from a machine-Learning model I always use k-fold validation. Here’s an example of implementation:
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
data = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
from sklearn.model_selection import train_test_split
X=data.iloc[:,1:]
y=data['admit']
from sklearn.neural_network import MLPClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
number_nb=1
mlp = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2))
mlp.fit(X_train, y_train)
pred = mlp.predict(X_test)
With the model ready, I define a function to evaluate the model:
def evaluate_model(X, y, model, metric):
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
scores = cross_val_score(model, X, y, scoring=metric, cv=cv, n_jobs=-1)
return np.mean(scores)
Assessing the accuracy of:
evaluate_model(X_train, y_train, mlp,'accuracy')
Returns: 0.6829 ...
It turns out that I would like to change the Threshold of acceptance of the model in order to choose the combination of sensitivity and specificity best suited to my specific case. Adopting the procedure described in this question I can generate the new prediction vector, but with it it is not possible to use the function cross_val_score
.
There is a way to calculate metrics by k-fold validation when changing the cut-off of the model?
Thank you. Truly creative solution of creating a new class. Unfortunately, however, the function
cross_val_score
is returning an array ofnan
now– Lucas
Apparently sklearn chia with private attribute notation (
_attr
). I put an alternative implementation in case the first one doesn’t work.– amiasato