0
I am working with a regression problem. I built a Multilayerperceptron (MLP) using Scikit-Learn. I made two predictions...one using MPL using 70% of data for training and 30% for validation. Calculating Quadratic Mean Error (EMQ) of the value generated by MLP and the validation database I arrived at an EMQ of 0.583 (error variable) I tried to take the same MLP and make a prediction using cross validation with cross_val_predict and imagined that the EMQ would be less than the EMQ of the MLP however, the value of the EMQ (errocv) gave 3.584. I did something wrong in my code, in the cross-validation part, or is it normal like that?
#Importando os Módulos
import pandas as pd
import keras
import numpy as np
from sklearn.model_selection import cross_validate
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
#Importando o Dataset
base = pd.read_excel('Aval_ASF_Alt_Elip.xlsx')#Dataset
#Definição das Variaveis x e y
x = base.iloc[:, 6:7].values
y = base.iloc[:, 5:6].values
#Dividindo os dados em treinamento em teste
X_treinamento, X_teste, y_treinamento, y_teste = train_test_split(x,y,
test_size=0.30,random_state=0)
#Criando o MLP
model = MLPRegressor(hidden_layer_sizes=(10,10,10,10,10,10,10),activation='identity',
solver='adam',learning_rate='constant',batch_size='auto',
max_iter=1000,verbose=False,random_state=0)
#Treinando o MLP
model.fit(X_treinamento, y_treinamento.ravel())
#Geração das Predições do MLP
ypred_skl = model.predict(y_teste)
ypred_skl = ypred_skl.reshape(-1,1)
#Calculo do Erro do MLP
erro = np.round(np.sqrt(np.mean((y_teste-ypred_skl)**2)),3)
#Validação Cruzada
ypredcv = cross_val_predict(model, x, y.ravel(), cv=15, verbose=0)
ypredcv = ypredcv.reshape(-1,1)
#Calculo do Erro da Validação Cruzada
errocv = np.round(np.sqrt(np.mean((y-ypredcv)**2)),3)
Grateful for the attention.
Vc did the validation of your model using
test_size = 0.30
, that is, 30% of your dataset will be used for testing, while in cross-validation you used "15-fold validation", which divides your dataset into 15 groups of equal size to perform cross-validation. With this, you have different workout/test size for simple validation and cross validation. This may have caused the error problem. I recommend thesklearn.metrics
to calculate your validation metrics.– João Victor
Thank you. I’ll do that.
– Rodrigo Ferraz