0
I have 2 dataframes in Pandas where:
- Data Frame A - contains information of the available time of announcements in each program
DFTmpDisp
sinal cod_programa data tempo_disponivel
0 SP1 XXXX 25/07/2020 10
1 BH XXXX 01/08/2020 300
2 NAC XXXX 08/08/2020 118
3 SP1 XXXX 15/08/2020 106
4 SP1 XXXX 22/08/2020 6
2287 BH ZZZZ 24/08/2020 218
2288 BH ZZZZ 25/08/2020 66
2289 BH ZZZZ 26/08/2020 44
2290 BH ZZZZ 27/08/2020 190
2291 BH ZZZZ 28/08/2020 202
> Para o Dataframe A, converti a data em dias, ficando portanto conforme
DFTmpDisp['dia_semana'] = pd.to_datetime(DFTmpDisp['dia_semana'], format='%d/%m/%Y').dt.dayofweek
sinal cod_programa data tempo_disponivel
0 SP1 XXXX 5 10
1 BH XXXX 5 300
2 NAC XXXX 5 118
3 SP1 XXXX 5 106
4 SP1 XXXX 5 6
2287 BH ZZZZ 0 218
2288 BH ZZZZ 1 66
2289 BH ZZZZ 2 44
2290 BH ZZZZ 3 190
2291 BH ZZZZ 4 202
- DataFrame B - informações de audiência dos programas que já foram
exibidos
DFTmpAud
sinal cod_programa data_exibicao data_inicio_programa media_audiencia
2278 SP1 XXXX 23/05/2020 02/05/2020 19:01:00 325104.42
2280 SP1 XXXX 09/05/2020 02/05/2020 19:01:00 255104.42
2275 BH XXXX 02/05/2020 02/05/2020 19:01:00 455104.42
6220 SP1 XXXX 16/05/2020 02/05/2020 19:05:00 118375.44
3345 NAC XXXX 16/05/2020 02/05/2020 19:05:00 1053999.80
2296 SP1 XXXX 30/05/2020 02/05/2020 19:05:00 1299650.50
8873 SP1 XXXX 02/05/2020 02/05/2020 19:05:00 14091402.82
276 BH ZZZZ 01/05/2020 01/05/2020 11:30:00 150998.68
8536 NAC ZZZZ 01/05/2020 01/05/2020 11:30:00 7846891.62
5871 SP1 ZZZZ 01/05/2020 01/05/2020 11:30:00 841810.51
6329 DF ZZZZ 01/05/2020 01/05/2020 11:30:00 123071.38
237 RJ ZZZZ 01/05/2020 01/05/2020 11:30:00 581696.70
For Dataframe B I ordered date display by increasing date
DFTmpAud = DFTmpAud.sort_values('data_exibicao')
I converted the dates to datetime and put in the format day/month/year, a time they were in the format Year/Month/day
DFTmpAud['data_exibicao'] = pd.to_datetime(DFTmpAud['data_exibicao'], format='%Y-%m-%d')
DFTmpAud['data_inicio_programa'] = pd.to_datetime(DFTmpAud['data_inicio_programa'], format='%Y-%m-%d %H:%M:%S')
Problem: I need to calculate the median ratings in the last four shows of the show by default, cod_program and day of the week. Data for calculation are in Dataframe B (Dftmpaud). After creating a new column in dataframe A it will contain these median values for each day. Remembering that I only need the last 4 dates, I thought of using groupby, grouping by the way, program and date, but I did not find a way to get only the last 4 dates. Filter then would be sign, program and 4 last display dates
Observation to calculate the median for SP1: Filter by signal = SP1 cod_program = XXXX Day 5 (Saturday) -> all display dates fall on Saturday which is equal to 5. Pick from highest to lowest date
As datasetB data should be found 5 records for day 5
2296 SP1 XXXX 30/05/2020 02/05/2020 19:05:00 1299650.50
2278 SP1 XXXX 23/05/2020 02/05/2020 19:01:00 325104.42
6220 SP1 XXXX 16/05/2020 02/05/2020 19:05:00 118375.44
2280 SP1 XXXX 09/05/2020 02/05/2020 19:01:00 255104.42
8873 SP1 XXXX 02/05/2020 02/05/2020 19:05:00 14091402.82
Take the last 4 dates (from highest to lowest) of the above result
The result should be sinal cod_programa data_exibicao data_inicio_programa media_audiencia
2296 SP1 XXXX 30/05/2020 02/05/2020 19:05:00
1299650.50
2278 SP1 XXXX 23/05/2020 02/05/2020 19:01:00 325104.42
6220 SP1 XXXX 16/05/2020 02/05/2020 19:05:00 118375.44
2280 SP1 XXXX 09/05/2020 02/05/2020 19:01:00 255104.42
Add the values contained in media_audiencia which will be 1998234,78 divided by 4 (as there are 4 lines) - 1998234,78/4 Result = 499558,695
For the BH sign cod_program = XXXX Day 5 (Saturday) Should be found only 1 record, then the media is the own media_audience for this signal and code program on 05.
2275 BH XXXX 02/05/2020 02/05/2020 19:01:00 455104.42
Dataframe Expected Result
sinal cod_programa data tempo_disponivel mediana
0 SP1 XXXX 5 10 499558,695
2275 BH XXXX 5 300 455104.42
I will read your post calmly later. But, as it has date filters list, see if that answer helping
– Paulo Marques
Based on the two samples of Dataframes you posted, what would be the expected output? If you want to calculate the average of the last 4 events of each program it can be interesting to put a sample of data larger than 5 lines so that it is possible to reproduce exactly what you are going through
– Terry
Hi Terry, I edited the question. I think you can understand better.
– Lisnete Miranda
Ola Paulo, I edited the question with the result I hope. I think I can better understand. Thank you
– Lisnete Miranda
Hi. If any of the questions helped you or solved your problem, please consider accepted as correct
– Terry