Grouping by several variables

Asked

Viewed 29 times

1

I’m working on the base below, and I need to group the variable close, grouping the last day of each week, by month and year, the base I’m working on has data from 2005 to 2017, below I’m just leaving a sample.

date        close   year   month  day    quarter      week
2016-12-23   961    2016    12    23      4        51
2016-12-22   928    2016    12    22      4        51
2016-12-21   926    2016    12    21      4        51
2016-12-20   914    2016    12    20      4        51
2016-12-19   927    2016    12    19      4        51
2016-12-16   946    2016    12    16      4        50
2016-12-15   966    2016    12    15      4        50
2016-12-14   1003   2016    12    14      4        50
2016-12-13   1052   2016    12    13      4        50
2016-12-12   1069   2016    12    12      4        50
2016-12-23   934    2017    12    23      4        51
2016-12-22   928    2017    12    22      4        51
2016-12-21   926    2017    12    21      4        51
2016-12-20   914    2017    12    20      4        51
2016-12-19   927    2017    12    19      4        51
2016-12-16   933    2017    12    16      4        50
2016-12-15   966    2017    12    15      4        50
2016-12-14   1003   2017    12    14      4        50
2016-12-13   1052   2017    12    13      4        50
2016-12-12   1069   2017    12    12      4        50

Grouping should look like this:

date       close    year    month     day     quarter   week
2016-12-23  961     2016      12      23       4         51
2016-12-16  946     2016      12      16       4         50
2017-12-23  934     2017      12      23       4         51
2017-12-16  933     2017      12      16       4         50

Can someone help me with that?

  • What have you tried?

  • I tried the code below, but it’s not right, because it takes the last of the day, and I want the last date of each week, by month and year. df_bdiy = df_bdiy.groupby(['day','week','Month','year'])['close']. last(). reset_index()

  • 1

    Edit the question by pressing the [Edit] button and append the code of your attempt to the question.

1 answer

1


I got the code below:

df = df.sort_values(['year', 'month','quarter', 'week' , 'day']).drop_duplicates(['year', 'month','quarter', 'week'] , keep ='last')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.