(Pandas) - Group and summarise by date

Asked

Viewed 3,253 times

2

Hello, I’m a beginner in pandas and I caught in a problem that I didn’t find/understand how to solve in the documentation or other topics. Briefly I need to group the days of the observations from my database in an interval of five days, and for each interval calculate the average occurrence of accidents, I am trying unsuccessfully something like:

df = df.groupby(pd.TimeGrouper('5D'))['Acidentes'].mean()
     Data       Hora    Acidentes    Vítimas ...
0  12/02/2017    00          0          0
1  12/02/2017    01          2          1
...
24 13/02/2017    00          1          0
25 13/02/2017    01          0          0 
...
95 30/04/2017    23          3          2

They are recorded occurrences per day and per hour, but the intention is to group for an interval of days and then average accidents for each interval.

  • It would help if you gave an example (5-10 lines) of your dataset, or some content from your source (maybe a csv)

  • I edited explaining the problem.

1 answer

3

Given that Dataframe:

# -*- coding: utf-8 -*-
import pandas as pd

d = {'Data': ['01/02/2017','06/02/2017','03/02/2017','02/02/2017','01/02/2017'],
     'Acidentes': [0,2,1,0,1],
     'Vitimas': [0,1,0,0,2]}
df = pd.DataFrame(data=d)
df['Data'] = pd.to_datetime(df['Data'], format='%d/%m/%Y') #transformei em data
df = df.sort_values(['Data']) #ordenar para vizualizar melhor
>>> print df
   Acidentes       Data  Vitimas
0          0 2017-02-01        0
4          1 2017-02-01        2
3          0 2017-02-02        0
2          1 2017-02-03        0
1          2 2017-02-06        1

We can use the ream:

df = df.set_index('Data').resample('5D').mean()
>>> print df
            Acidentes  Vitimas
Data                          
2017-02-01        0.5      0.5
2017-02-06        2.0      1.0

[Edit]

Returning the dates to the original pattern:

df = df.reset_index()
df['Data'] = df['Data'].apply(lambda x: x.__format__('%d/%m/%Y'))
>>> print df
         Data  Acidentes  Vitimas
0  01/02/2017        0.5      0.5
1  06/02/2017        2.0      1.0
  • Thank you very much! This answer is perfect for me, but as a curiosity, how to return the date displayed in df to the Brazilian standard?

  • @Viniciusoliveira, changing her format with the format. I edited the answer with this addendum.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.