Predictions from Cross Validation

Question

Predictions from Cross Validation

Asked 5 years, 2 months ago

Viewed 98 times

0

I am working with a regression problem. I built a Multilayerperceptron (MLP) using Scikit-Learn. I made two predictions...one using MPL using 70% of data for training and 30% for validation. Calculating Quadratic Mean Error (EMQ) of the value generated by MLP and the validation database I arrived at an EMQ of 0.583 (error variable) I tried to take the same MLP and make a prediction using cross validation with cross_val_predict and imagined that the EMQ would be less than the EMQ of the MLP however, the value of the EMQ (errocv) gave 3.584. I did something wrong in my code, in the cross-validation part, or is it normal like that?

#Importando os Módulos
import pandas as pd
import keras
import numpy as np
from sklearn.model_selection import cross_validate 
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

#Importando o Dataset
base = pd.read_excel('Aval_ASF_Alt_Elip.xlsx')#Dataset

#Definição das Variaveis  x e y
x = base.iloc[:, 6:7].values
y = base.iloc[:, 5:6].values

#Dividindo os dados em treinamento em teste
X_treinamento, X_teste, y_treinamento, y_teste = train_test_split(x,y, 
                                                                  test_size=0.30,random_state=0)

#Criando o MLP
model = MLPRegressor(hidden_layer_sizes=(10,10,10,10,10,10,10),activation='identity',
                     solver='adam',learning_rate='constant',batch_size='auto', 
                     max_iter=1000,verbose=False,random_state=0)

#Treinando o MLP
model.fit(X_treinamento, y_treinamento.ravel())

#Geração das Predições do MLP 
ypred_skl = model.predict(y_teste)
ypred_skl = ypred_skl.reshape(-1,1)

#Calculo do Erro do MLP
erro = np.round(np.sqrt(np.mean((y_teste-ypred_skl)**2)),3)

#Validação Cruzada
ypredcv = cross_val_predict(model, x, y.ravel(), cv=15, verbose=0)
ypredcv = ypredcv.reshape(-1,1)

#Calculo do Erro da Validação Cruzada
errocv = np.round(np.sqrt(np.mean((y-ypredcv)**2)),3)

Grateful for the attention.

1

Vc did the validation of your model using test_size = 0.30, that is, 30% of your dataset will be used for testing, while in cross-validation you used "15-fold validation", which divides your dataset into 15 groups of equal size to perform cross-validation. With this, you have different workout/test size for simple validation and cross validation. This may have caused the error problem. I recommend the sklearn.metrics to calculate your validation metrics.

– João Victor

2020/05/31 at 03:08
Thank you. I’ll do that.

– Rodrigo Ferraz

2020/05/31 at 20:11

No answers

Browser other questions tagged python regression sklearn

You are not signed in. Login or sign up in order to post.