0
I have 2 dataframes in Pandas where:
- Data Frame A - contains information of the available time of announcements in each program
DFTmpDisp
      sinal   cod_programa      data              tempo_disponivel
0      SP1         XXXX         25/07/2020         10
1      BH          XXXX         01/08/2020        300
2      NAC         XXXX         08/08/2020        118
3      SP1         XXXX         15/08/2020        106
4      SP1         XXXX         22/08/2020          6
2287    BH         ZZZZ         24/08/2020        218
2288    BH         ZZZZ         25/08/2020         66
2289    BH         ZZZZ         26/08/2020         44
2290    BH         ZZZZ         27/08/2020        190
2291    BH         ZZZZ         28/08/2020        202
> Para o Dataframe A, converti a data em dias, ficando portanto conforme
DFTmpDisp['dia_semana'] = pd.to_datetime(DFTmpDisp['dia_semana'], format='%d/%m/%Y').dt.dayofweek
 sinal   cod_programa      data              tempo_disponivel
0      SP1         XXXX           5                10
1      BH          XXXX           5               300
2      NAC         XXXX           5               118
3      SP1         XXXX           5               106
4      SP1         XXXX           5                 6
2287    BH         ZZZZ           0               218
2288    BH         ZZZZ           1                66
2289    BH         ZZZZ           2                44
2290    BH         ZZZZ           3               190
2291    BH         ZZZZ           4               202
- DataFrame B - informações de audiência dos programas que já foram
   exibidos
DFTmpAud
     sinal cod_programa data_exibicao data_inicio_programa  media_audiencia
2278   SP1         XXXX    23/05/2020  02/05/2020 19:01:00        325104.42
2280   SP1         XXXX    09/05/2020  02/05/2020 19:01:00        255104.42
2275    BH         XXXX    02/05/2020  02/05/2020 19:01:00        455104.42
6220   SP1         XXXX    16/05/2020  02/05/2020 19:05:00        118375.44
3345    NAC        XXXX    16/05/2020  02/05/2020 19:05:00       1053999.80
2296   SP1         XXXX    30/05/2020  02/05/2020 19:05:00       1299650.50
8873   SP1         XXXX    02/05/2020  02/05/2020 19:05:00      14091402.82
276     BH         ZZZZ    01/05/2020  01/05/2020 11:30:00        150998.68
8536   NAC         ZZZZ    01/05/2020  01/05/2020 11:30:00       7846891.62
5871   SP1         ZZZZ    01/05/2020  01/05/2020 11:30:00        841810.51
6329    DF         ZZZZ    01/05/2020  01/05/2020 11:30:00        123071.38
237     RJ         ZZZZ    01/05/2020  01/05/2020 11:30:00        581696.70
For Dataframe B I ordered date display by increasing date
DFTmpAud = DFTmpAud.sort_values('data_exibicao') 
I converted the dates to datetime and put in the format day/month/year, a time they were in the format Year/Month/day
DFTmpAud['data_exibicao'] = pd.to_datetime(DFTmpAud['data_exibicao'], format='%Y-%m-%d')
DFTmpAud['data_inicio_programa'] =  pd.to_datetime(DFTmpAud['data_inicio_programa'], format='%Y-%m-%d %H:%M:%S') 
Problem: I need to calculate the median ratings in the last four shows of the show by default, cod_program and day of the week. Data for calculation are in Dataframe B (Dftmpaud). After creating a new column in dataframe A it will contain these median values for each day. Remembering that I only need the last 4 dates, I thought of using groupby, grouping by the way, program and date, but I did not find a way to get only the last 4 dates. Filter then would be sign, program and 4 last display dates
Observation to calculate the median for SP1: Filter by signal = SP1 cod_program = XXXX Day 5 (Saturday) -> all display dates fall on Saturday which is equal to 5. Pick from highest to lowest date
As datasetB data should be found 5 records for day 5
2296 SP1 XXXX 30/05/2020 02/05/2020 19:05:00 1299650.50
2278 SP1 XXXX 23/05/2020 02/05/2020 19:01:00 325104.42
6220 SP1 XXXX 16/05/2020 02/05/2020 19:05:00 118375.44
2280 SP1 XXXX 09/05/2020 02/05/2020 19:01:00 255104.42
8873 SP1 XXXX 02/05/2020 02/05/2020 19:05:00 14091402.82
Take the last 4 dates (from highest to lowest) of the above result
The result should be sinal cod_programa data_exibicao data_inicio_programa media_audiencia
2296 SP1 XXXX 30/05/2020 02/05/2020 19:05:00
1299650.50
2278 SP1 XXXX 23/05/2020 02/05/2020 19:01:00 325104.42
6220 SP1 XXXX 16/05/2020 02/05/2020 19:05:00 118375.44
2280 SP1 XXXX 09/05/2020 02/05/2020 19:01:00 255104.42
Add the values contained in media_audiencia which will be 1998234,78 divided by 4 (as there are 4 lines) - 1998234,78/4 Result = 499558,695
For the BH sign cod_program = XXXX Day 5 (Saturday) Should be found only 1 record, then the media is the own media_audience for this signal and code program on 05.
2275 BH XXXX 02/05/2020 02/05/2020 19:01:00 455104.42
Dataframe Expected Result
       sinal   cod_programa      data        tempo_disponivel    mediana
0      SP1         XXXX           5           10        499558,695
2275    BH         XXXX           5           300        455104.42
I will read your post calmly later. But, as it has date filters list, see if that answer helping
– Paulo Marques
Based on the two samples of Dataframes you posted, what would be the expected output? If you want to calculate the average of the last 4 events of each program it can be interesting to put a sample of data larger than 5 lines so that it is possible to reproduce exactly what you are going through
– Terry
Hi Terry, I edited the question. I think you can understand better.
– Lisnete Miranda
Ola Paulo, I edited the question with the result I hope. I think I can better understand. Thank you
– Lisnete Miranda
Hi. If any of the questions helped you or solved your problem, please consider accepted as correct
– Terry