Allocate day period to hours

Asked

Viewed 361 times

0

I have the following df:

df = pd.DataFrame({'hora completa':['21:35:00', '22:16:00', '00:50:00', '09:30:00', '14:30:00']})
print(df)
  hora completa
0      21:35:00
1      22:16:00
2      00:50:00
3      09:30:00
4      14:30:00

I need to create a 'period' column with the criteria:

- Madrugada: Entre 00:00 e 05:59
- Manhã: Entre às 06:00 e 11:59
- Tarde: Entre o 12:00 e 17:59
- Noite: Entre às 18:00 e 23:59

I tried to solve using numpy, like this:

mask1 = df['hora completa'].between('06:00:00', '11:59:00')
mask2 = df['hora completa'].between('12:00:00', '17:59:00')
mask3 = df['hora completa'].between('18:00:00', '23:59:00')
mask4 = df['hora completa'].between('00:00:00', '05:59:00')

df['periodo'] = np.where(mask1, 'Manhã', 
                         mask2, 'Tarde', 
                         mask3, 'Noite', 
                         mask4, 'Madrugada')

print (df)

But make the mistake below:

TypeError: where() takes at most 3 arguments (8 given)

What am I missing? It is possible to make one def with if and else?

1 answer

2


The message that function where takes a maximum of 3 arguments, and you are giving 8! You can use the function apply and inform as parameters, a function which will be executed for each item and the axis:

import pandas as pd

def periodo(linha):
  horario = linha['hora completa']
  if '06:00:00' < horario < '11:59:00': return 'Manha'
  elif '12:00:00' < horario < '17:59:00': return 'Tarde'
  elif '18:00:00' < horario < '23:59:00': return 'Noite'
  elif '00:00:00' < horario < '05:59:00': return 'Madrugada'
  return ''

df = pd.DataFrame({'hora completa':['21:35:00', '22:16:00', '00:50:00', '09:30:00', '14:30:00']})
df['periodo'] = df.apply(periodo, axis=1)
print(df)

Exit:

  hora completa    periodo
0      21:35:00      Noite
1      22:16:00      Noite
2      00:50:00  Madrugada
3      09:30:00      Manha
4      14:30:00      Tarde

See working on repl.it and also I created a Gist in Githubgist

Reference

  • Thanks Noobsaibot, so it worked well but only when I create a df with pd.Dataframe (like I created here as an example). But when I was used in the original dataframe that I have didn’t work. Gave the following error: Typeerror: ("'<' not supported between instances of 'str' and 'datetime.time'", 'occurred at index 0'). The cell type is the same (non-null Object) for the original df and for the df created with pd.dataframe. I tried to change some things, created another df from the time column and the same error. What can it be? Thanks.

  • @Ricardostorck This example of Dataframe you put up is based on the Dataframe you have ?

  • Yes my dear, based on it. It’s the same column called full hour. The original df has more than 98,000 rows and 16 columns. When I tried to apply the code I imagined that it was a problem with the type of cells. But it is the.

  • See: https://repl.it/repls/AmusingMasculineMolecule

  • It worked, thank you very much. I got it another way, too. So: periodo = [] for Row in df['full time']: if Row >=0 and Row <=5: periodo.append('Dawn') Elif Row >=6 and Row <=11: periodo.append('Morning') Elif Row >= 12 and Row <=17: periodo.append('Afternoon') Elif Row >= 18 and Row <=23: periodo.append('Night') df['periodo'] = periodo

Browser other questions tagged

You are not signed in. Login or sign up in order to post.