1
I was assembling some models for data prediction in Machine Learning and ended up stopping at an error during a prediction attempt using the Linearregressor and Xgbregressor algorithms. Below follows the code:
pMeses = [5,6,7,8,9,10,11,12,1,2,3,4,5,6]
pAnos = [2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020]
pQtdDias = [31,30,31,31,30,31,30,31,31,29,31,30,31,30]
pMovS = [0,0,0,0,0,1,1,1,1,1,1,0,0,0]
pDiasUteis19 = []
pDiasUteis20 = []
pDiasUteis21 = []
pDiasUteis22 = []
for i in range(len(pMeses)):
if(cal.get_working_days_delta(date(pAnos[i],pMeses[i],1), date(pAnos[i],pMeses[i],pQtdDias[i])) == 19):
pDiasUteis19.append(1)
else:
pDiasUteis19.append(0)
for i in range(len(pMeses)):
if(cal.get_working_days_delta(date(pAnos[i],pMeses[i],1), date(pAnos[i],pMeses[i],pQtdDias[i])) == 20):
pDiasUteis20.append(1)
else:
pDiasUteis20.append(0)
for i in range(len(pMeses)):
if(cal.get_working_days_delta(date(pAnos[i],pMeses[i],1), date(pAnos[i],pMeses[i],pQtdDias[i])) == 21):
pDiasUteis21.append(1)
else:
pDiasUteis21.append(0)
for i in range(len(pMeses)):
if(cal.get_working_days_delta(date(pAnos[i],pMeses[i],1), date(pAnos[i],pMeses[i],pQtdDias[i])) == 22):
pDiasUteis22.append(1)
else:
pDiasUteis22.append(0)
# print(pDiasUteis19)
# print(pDiasUteis20)
# print(pDiasUteis21)
# print(pDiasUteis22)
entrada2 = []
lr_predict2 = []
xgb_predict2 = []
for i in range(len(pMeses)):
entrada2.append([[pMeses[i], pAnos[i], pQtdDias[i], pMovS[i], pDiasUteis19[i], pDiasUteis20[i], pDiasUteis21[i], pDiasUteis22[i]]])
lr_predict2.append(int(lr_model2.predict(entrada2[i])[0]))
xgb_predict2.append(int(xgb_model2.predict(entrada2[i])[0]))
print('Mes: {:02d} LRegressor: {}'.format(pMeses[i], lr_predict2[i]))
print('Mes: {:02d} XGBRegressor: {}'.format(pMeses[i], xgb_predict2[i]))
And that was the error presented:
ValueError: feature_names mismatch: ['mes', 'ano', 'qtdDias', 'sMov', 'diasUteis19', 'diasUteis20', 'diasUteis21', 'diasUteis22'] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7']
expected diasUteis22, diasUteis21, diasUteis19, ano, mes, diasUteis20, sMov, qtdDias in input data
training data did not have the following fields: f3, f2, f4, f7, f6, f0, f1, f5
Someone who already knows Machine Learning, could explain to me why Xgbregressor apparently cross the Dataframe columns to perform the forecast?
Because I’ve already searched the documentation and found nothing like it. And if indeed it is, I believe that it does not fit for my model, because it is a time series.
Hello @ramonfsk, all right? Do you want to make a time series model for forecast? If yes, for an approach with ARIMA or LSTM models can be better. Otherwise, with the Xgboosting, I understand that it is necessary to extract the time series of Features to be used in model training.
– Anderson Chaves
Speak Anderson, beauty? Indeed, the use of ARIMA and LSTM models are more suitable for forecast. When I asked the question I hadn’t gone into it, but now I know it doesn’t make much sense to use this algorithm for predictions. In any case, thank you for your reply.
– ramonfsk
Excellent! Thank you, Ramon and good work too! Success!
– Anderson Chaves