How to get averages of past occurrences relative to the line in question in a dataframe (Pandas)

Asked

Viewed 38 times

1

I’m building a predictive model to rank football matches, the model uses goal averages in the separate teams' season between bosses and visitors, I’m averaging the whole year, but in early season games I need to consider only the goals made until that round. Therefore it is necessary to make the calculation of the average considering only previous lines referring to the season in question. the code is thus:

avgMandanteTemporada = df_entradas.groupby(['Mandante','Ano'])['MandanteGols'].mean().reset_index()
avgVisitanteTemporada = df_entradas.groupby(['Visitante','Ano'])['VisitanteGols'].mean().reset_index()
avgGSMandanteTemporada = df_entradas.groupby(['Ano','Mandante'])['VisitanteGols'].mean().reset_index()
avgGSVisitanteTemporada = df_entradas.groupby(['Ano','Visitante'])['MandanteGols'].mean().reset_index()

But this code returns me the average in relation to the whole year, I need an alternative that returns me the averages calculated until each round. The dataset is like this:

dataset

I needed the average calculated separately per round, considering the goals made so far, previous lines belonging to the same year. I tried using . shift() but could not get the expected result.

I tried to represent through the image following the result I’m looking for:

resultado esperado

o Average result for each row must be the sum of all previous rows (for the season in question) divided by the number of rounds.

the dataset is available through the link: Link to dataset

Bs: Club 1 = Principal ; Club 2 = Visitor

  • 1

    You can provide a sample of the data (small csv) or a way to copy/paste the dataframe to make it easier to test and help?

  • 1

    I have no way to test, but if it works, tell me to update an answer tomorrow, try: df['AvgGF_Mandante'] = df['Mandante'].expanding().mean()

  • Hello Miguel, thank you very much, I used . expanding() and I think I got the expected results, I wrote the code like this: avgMandanteTemporada = df_entradas.groupby(['Mandante','Ano'])['MandanteGols'].expanding().mean().reset_index() I edited the publication and made available the dataset, I am validating the results.

  • The results are according to what I expected except for a small detail, the average is being calculated considering the line in question beyond all previous ones, I needed to be considered only the previous lines.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.