1
Hello.
I have two data frames:
Times with 629 lines.
rank  prev_rank  name        league                 off def     spi  
0   1   1   Manchester City Barclays Premier League 3.34    0.24    95.24  
1   2   2   Liverpool   Barclays Premier League 2.95    0.28    92.96  
2   3   3   Bayern Munich   German Bundesliga   3.29    0.46    92.43  
3   4   4   Paris Saint-Germain French Ligue 1  2.88    0.47    89.55  
4   5   7   Real Madrid Spanish Primera Division    2.79    0.46    88.98  
**E partidas** com 27122 linhas.
date    league_id   league  team1   team2   spi1    spi2    prob1   prob2   probtie ... importance1 importance2 score1  score2  xg1 xg2 nsxg1   nsxg2   adj_score1  adj_score2  
0   2016-08-12  1843    French Ligue 1  Bastia  Paris Saint-Germain 51.16   85.68   0.0463  0.8380  0.1157  ... 32.4    67.7    0.0 1.0 0.97    0.63    0.43    0.45    0.00    1.05  
1   2016-08-12  1843    French Ligue 1  AS Monaco   Guingamp    68.85   56.48   0.5714  0.1669  0.2617  ... 53.7    22.9    2.0 2.0 2.45    0.77    1.75    0.42    2.10    2.10  
2   2016-08-13  2411    Barclays Premier League Hull City   Leicester City  53.57   66.81   0.3459  0.3621  0.2921  ... 38.1    22.2    2.0 1.0 0.85    2.77    0.17    1.25    2.10    1.05  
3   2016-08-13  2411    Barclays Premier League Crystal Palace  West Bromwich Albion    55.19   58.66   0.4214  0.2939  0.2847  ... 43.6    34.6    0.0 1.0 1.11    0.68    0.84    1.60    0.00    1.05  
4   2016-08-13  2411    Barclays Premier League Everton Tottenham Hotspur   68.02   73.25   0.3910  0.3401  0.2689  ... 31.9    48.0    1.0 1.0 0.73    1.11    0.88    1.81    1.05    1.05  
I need to compare the team name (team1) of df2(matches) to the name df1(times) and when I find the corresponding add in a list team_id[] the value contained in "rank" in df1(times) and then add that list in df2 (matches). Including the new variable "team1_id" in data frame 2.
And then do the same pro team2 process generating "team2_id".
I tried a few ways and the last one was this:
team1_id = []
for i in range(0,27121):
    for n in range(0,628):
        if data['team1'].values[i] == times['name'].values[n]:
            team1_id.append(times['rank'][(n)])
But it returned only 24647 values even though there were no blank values in team1 and all exist in the other df. And when checking the values are not sorted correctly, it seems that the first line was not included.
team2_id = []
for i in range(0,27121):
    for n in range(0,628):
        if data['team2'].values[i] == times['name'].values[n]:
            team2_id.append(times['rank'][(n)])
This is correct the first records checked more also with a smaller number of records than expected.
Tete use merge: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
– Hugo Salvador