I have a date/time start and end frame, how do I separate for dates / start time / end time on pandas?

Asked

Viewed 46 times

0

In the database I have a report containing:

INICIO              FINAL
20/07/2021 09:17    20/07/2021 09:25
20/07/2021 09:17    20/07/2021 09:25
20/07/2021 09:17    20/07/2021 09:25
20/07/2021 09:28    20/07/2021 09:33

I need to turn into this:

DATA       HORA INICIO  HORA FINAL
20/07/2021  09:17:10    09:25:17
20/07/2021  09:17:10    09:25:17
20/07/2021  09:17:10    09:25:17
...

I made a code trying two functions

 import pandas as pd
 
 relatorio = pd.read_csv('relatorio.csv', encoding='latin1', sep=';')

 datas_df = relatorio[['INICIO','FINAL']]

 for dhi in datas_df['INICIO']:
    data, hora_inicio = dhi.split(' ')
    datas_df['DATA'] = data
    datas_df['HORA INICIO'] = hora_inicio
    relatorio.append(datas_df['DATA'])
    relatorio.append(datas_df['HORA INICIO'])

 for dhf in datas_df['FINAL']:
    data_f, hora_final = dhf.split(' ')
    datas_df['HORA FINAL'] = hora_final
    relatorio.insert(loc=2, column='HORA FINAL', value=datas_df['HORA FINAL'])

That way the 1st is the error because there is no 'DATA', and in the 2nd is not the error, but only takes the last value of the date and time.

I did with these 2 for to cite examples of methods I have tried.

Can you help? I think it’s something in the loop for right?

2 answers

3

Just use the constructor of pandas.Dataframe() combined with the properties pandas.Series.dt to access the datetime values of the series:

import pandas as pd

df = pd.DataFrame([
  ["20/07/2021 09:17:10", "20/07/2021 09:25:17"],
  ["20/07/2021 09:17:10", "20/07/2021 09:25:17"],
  ["20/07/2021 09:17:10", "20/07/2021 09:25:17"],
  ["20/07/2021 09:28:16", "20/07/2021 09:33:27"],
], columns=['INICIO', 'FIM'], dtype='datetime64[ns]');

df2 = pd.DataFrame({
  "DATA": df['INICIO'].dt.date,
  "HORA INICIO": df['INICIO'].dt.time,
  "HORA FIM": df['FIM'].dt.time,  
})

print(df2)

Test the example on Repl.it

         DATA HORA INICIO  HORA FIM
0  2021-07-20    09:17:10  09:25:17
1  2021-07-20    09:17:10  09:25:17
2  2021-07-20    09:17:10  09:25:17
3  2021-07-20    09:28:16  09:33:27

In case it had been used pandas.Series.dt.date and pandas.Series.dt.time

0

I did it that way:

datas_df = relatorio[['INICIO','FINAL']]
datas = []
horas_i = []
horas_f = []
for dhi in datas_df['INICIO']:
    data, hora_inicio = dhi.split(' ')
    datas.append(data)
    horas_i.append(hora_inicio)
for dhf in datas_df['FINAL']:
    data_f, hora_final = dhf.split(' ')
    horas_f.append(hora_final)
datas_df['DATA'] = datas
datas_df['HORA INICIO'] = horas_i
datas_df['HORA FINAL'] = horas_f
datas_df = datas_df.drop(columns=['INICIO', 'FINAL'])
display(datas_df)

    DATA    HORA INICIO HORA FINAL
0   20/07/2021  09:17:10    09:25:17
1   20/07/2021  09:17:10    09:25:17
2   20/07/2021  09:17:10    09:25:17
3   20/07/2021  09:28:16    09:33:27
4   20/07/2021  09:28:16    09:33:27

If you want to give me some tips. Always welcome!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.