Date change of a monthly average of Pandas

Asked

Viewed 93 times

0

Hello. Using the code below I perform a monthly average, but it is fixed on the last day of the current month and I would like to know if it is possible to configure it for a fixed day of the month as the first day or the 15th for example.

In the code, 123456789.csv is a generic archive with daily data for a few years from which monthly averages and the index_col is setting the date as index.

file0 = pd.read_csv('123456789.csv', sep = ',', index_col = 0)
file0.index = pd.to_datetime(file0.index)
monthly_mean = file0.resample('M').mean()

This would be an example of the original output:

                  Dados
Data                      
2006-01-31        4.206452
2006-02-28        3.878571
2006-03-31        4.038710
2006-04-30        4.113333
2006-05-31        4.306452
...                    ...
2014-08-31        4.312903
2014-09-30        4.456667
2014-10-31        3.958065
2014-11-30        3.950000
2014-12-31        3.661290

And that would be an example of a result:

                  Dados
Data                      
2006-01-15        4.206452
2006-02-15        3.878571
2006-03-15        4.038710
2006-04-15        4.113333
2006-05-15        4.306452
...                    ...
2014-08-15        4.312903
2014-09-15        4.456667
2014-10-15        3.958065
2014-11-15        3.950000
2014-12-15        3.661290
  • You can make the file available 123456789.csv or an equivalent example?

  • This is a very pertinent question. Starting from version 1.1 of pandas, there is an option to offset and one of origin in the DataFrame.resample, but none of them seem to work for months. The solution I could think of is to subtract 15 days from the date, resample and correct the date in the resample output. But I threw a question in the stack in English to see if anyone gives a better suggestion.

2 answers

0

Okay, there are two solutions to your problem ,but only one I found suitable. It consists of making a false offset on your Dataframe in which you subtract 14 days from all dates, calculate the monthly average and then add 14 days in the Honeys:

df['offset'] = df.index - pd.Timedelta('14D')
monthly_mean = df.resample('M', on='offset').mean()
monthly_mean.index = monthly_mean.index + pd.Timedelta('14D')
print(monthly_mean)

Just to call attention, the program will print the 14th day instead of the 15th, because it prints the date when the interval ends, ie from the 15th of the previous month until the 14th of this month.

The other solution is to divide the period into semiperiods using 'SM' instead of’M', but then need a series of manipulations to sum up the values of each period and calculate the average. It sounds simple, but it needs to be done carefully so there is no error of the range you are considering, because when using 'SM' it says the beginning of the period instead of the end.

0

If the question is only to change the day without changing the average, you can use apply with the replace method:

df['data'] = df['data'].apply(lambda d: d.replace(day = 15))

Entree:

          data     media
0   2006-01-31  4.206452
1   2006-02-28  3.878571
2   2006-03-31  4.038710
3   2006-04-30  4.113333
4   2006-05-31  4.306452
5   2014-08-31  4.312903
6   2014-09-30  4.456667
7   2014-10-31  3.958065
8   2014-11-30  3.950000
9   2014-12-31  3.661290

Exit

          data     media
0   2006-01-15  4.206452
1   2006-02-15  3.878571
2   2006-03-15  4.038710
3   2006-04-15  4.113333
4   2006-05-15  4.306452
5   2014-08-15  4.312903
6   2014-09-15  4.456667
7   2014-10-15  3.958065
8   2014-11-15  3.950000
9   2014-12-15  3.661290

Browser other questions tagged

You are not signed in. Login or sign up in order to post.