0
I am developing a project using scikit-Learn (and pandas to handle the data) to predict the results of football matches based on previous results. As a project methodology, for each row of the dataset used in the prediction, the result is determined using the last three games of the teams, which I put in an array of Numpy. I got an error when I used to try to add the set with the last three games to the pandas dataset that will be used in the classification. The code:
import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
ds_resultados = pd.read_csv("E0_new2.csv")
ds_resultados["Date"] = pd.to_datetime(ds_resultados.Date)
data = ds_resultados[["FTHG", "FTAG", "HTHG", "HTAG", "HS", "AS", "HST", "AST"]]
target = ds_resultados["FTR"]
#CÓDIGO PARA PEGAR OS TRÊS ÚLTIMOS RESULTADOS PARA CADA JOGO
cont = 0;
for index, row in ds_resultados.iterrows():
auxHome = ds_resultados[((ds_resultados["HomeTeam"] == row["HomeTeam"])
| (ds_resultados["AwayTeam"] == row["HomeTeam"]))]
auxHome = auxHome[auxHome["Date"] < row["Date"]].sort_values(by="Date", ascending=True).head(n = 3)
listaTarget = np.array([]).astype('int64')
if auxHome.shape[0] == 3:
for i in range(0, 3):
linha = auxHome.iloc[i]
listaTarget = np.append(listaTarget, linha["FTR"])
data = data.append(row[["FTHG", "FTAG", "HTHG", "HTAG", "HS", "AS", "HST", "AST"]])
target = target.append(pd.Series([]), ignore_index = True)
target.at[cont] = listaTarget #linha que, quando executada, gera o erro
cont = cont + 1
When the line
target.at[cont] = listaTarget
is runs, I get the following error
ValueError: setting an array element with a sequence.
If necessary, I leave below the data set (E0_new2.csv) used with appropriate subtitles (the "FTR" column is used as a variable target, I put it in the dataset for explanatory purposes only):
print(data.head(n=10))
FTHG FTAG HTHG HTAG HS AS HST AST FTR
0 2 1 1 0 8 13 6 4 1
1 2 0 1 0 12 10 4 1 1
2 0 2 0 1 15 10 6 9 -1
3 0 3 0 2 6 13 1 4 -1
4 1 2 1 2 15 15 2 5 -1
5 2 0 1 0 19 6 5 0 1
6 2 2 1 1 11 6 4 5 0
7 0 2 0 1 9 17 3 8 -1
8 4 0 2 0 18 5 8 2 1
9 0 0 0 0 18 16 3 6 0
Legend of the data: FTHG (Full-Time Home Team Goals): how many goals did the home team score in the match FTAG (Full-Time Away Team Goals): how many goals did the visiting team score in the match HTHG (Half-Time Home Team Goals): How many goals did the home team score until halftime HTAG (Half-Time Away Team Goals): How many goals did the visiting team score until halftime HS (Home Team Shots): how many shots the home team gave to the opponent’s goal AS (Away Team Shots): how many shots the visiting team gave to the opponent’s goal HST (Home Team Shots on Target): how many shots did the home team hit AST (Away Team Shots on Target): how many shots did the visiting team hit FTR (Full-Time Result): final result of the match, with 0 indicating draw, 1 indicating victory of the home team and -1 indicating victory of the visiting team
How could I form a collection in which one of the elements is another collection for that purpose?