1
Hello.
I have two data frames:
Times with 629 lines.
rank prev_rank name league off def spi
0 1 1 Manchester City Barclays Premier League 3.34 0.24 95.24
1 2 2 Liverpool Barclays Premier League 2.95 0.28 92.96
2 3 3 Bayern Munich German Bundesliga 3.29 0.46 92.43
3 4 4 Paris Saint-Germain French Ligue 1 2.88 0.47 89.55
4 5 7 Real Madrid Spanish Primera Division 2.79 0.46 88.98
**E partidas** com 27122 linhas.
date league_id league team1 team2 spi1 spi2 prob1 prob2 probtie ... importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1 adj_score2
0 2016-08-12 1843 French Ligue 1 Bastia Paris Saint-Germain 51.16 85.68 0.0463 0.8380 0.1157 ... 32.4 67.7 0.0 1.0 0.97 0.63 0.43 0.45 0.00 1.05
1 2016-08-12 1843 French Ligue 1 AS Monaco Guingamp 68.85 56.48 0.5714 0.1669 0.2617 ... 53.7 22.9 2.0 2.0 2.45 0.77 1.75 0.42 2.10 2.10
2 2016-08-13 2411 Barclays Premier League Hull City Leicester City 53.57 66.81 0.3459 0.3621 0.2921 ... 38.1 22.2 2.0 1.0 0.85 2.77 0.17 1.25 2.10 1.05
3 2016-08-13 2411 Barclays Premier League Crystal Palace West Bromwich Albion 55.19 58.66 0.4214 0.2939 0.2847 ... 43.6 34.6 0.0 1.0 1.11 0.68 0.84 1.60 0.00 1.05
4 2016-08-13 2411 Barclays Premier League Everton Tottenham Hotspur 68.02 73.25 0.3910 0.3401 0.2689 ... 31.9 48.0 1.0 1.0 0.73 1.11 0.88 1.81 1.05 1.05
I need to compare the team name (team1) of df2(matches) to the name df1(times) and when I find the corresponding add in a list team_id[] the value contained in "rank" in df1(times) and then add that list in df2 (matches). Including the new variable "team1_id" in data frame 2.
And then do the same pro team2 process generating "team2_id".
I tried a few ways and the last one was this:
team1_id = []
for i in range(0,27121):
for n in range(0,628):
if data['team1'].values[i] == times['name'].values[n]:
team1_id.append(times['rank'][(n)])
But it returned only 24647 values even though there were no blank values in team1 and all exist in the other df. And when checking the values are not sorted correctly, it seems that the first line was not included.
team2_id = []
for i in range(0,27121):
for n in range(0,628):
if data['team2'].values[i] == times['name'].values[n]:
team2_id.append(times['rank'][(n)])
This is correct the first records checked more also with a smaller number of records than expected.
Tete use merge: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
– Hugo Salvador