Separating a dataframe by some criteria - Python Pandas

Asked

Viewed 6,918 times

3

I have a csv file with little more than 500 thousand lines that I would like to separate into two dataframes.

The format of each line: date, time, value of interest variable

dd-mm-yy,hh:mm:00,float

The lines are recorded every minute. So in a year are: 526.600 lines.

I would like to do a 'split' separating the day from the night, ie based on the time intervals. 08:00:00 until 20:00:00 have one df and another for the night.

How to do this using pandas?

1 answer

3


Assuming the following data for this reply, saved in the file 'dados.csv':

data,hora,numero
01/01/2018,10:30,0.1
01/01/2018,20:05,0.2
01/01/2018,07:00,0.3
02/01/2018,11:10,0.4
02/01/2018,22:35,0.5
03/01/2018,03:10,0.6
03/01/2018,20:45,0.7
03/01/2018,12:20,0.8
04/01/2018,15:15,0.9
04/01/2018,23:59,0.95

When loading the Dataframe, specify the type of columns data and hora as dates, specifying the column name in the parameter parse_dates of command read_csv:

import pandas as pd
import datetime

df = pd.read_csv('dados.csv', parse_dates=['data','hora'])

Exit:

Out[4]:
        data                hora  numero
0 2018-01-01 2018-06-02 10:30:00    0.10
1 2018-01-01 2018-06-02 20:05:00    0.20
2 2018-01-01 2018-06-02 07:00:00    0.30
3 2018-02-01 2018-06-02 11:10:00    0.40
4 2018-02-01 2018-06-02 22:35:00    0.50
5 2018-03-01 2018-06-02 03:10:00    0.60
6 2018-03-01 2018-06-02 20:45:00    0.70
7 2018-03-01 2018-06-02 12:20:00    0.80
8 2018-04-01 2018-06-02 15:15:00    0.90
9 2018-04-01 2018-06-02 23:59:00    0.95

To transform the column hora in time format:

df['hora'] = pd.to_datetime(df['hora']).dt.time

Exit:

Out[6]:
        data      hora  numero
0 2018-01-01  10:30:00    0.10
1 2018-01-01  20:05:00    0.20
2 2018-01-01  07:00:00    0.30
3 2018-02-01  11:10:00    0.40
4 2018-02-01  22:35:00    0.50
5 2018-03-01  03:10:00    0.60
6 2018-03-01  20:45:00    0.70
7 2018-03-01  12:20:00    0.80
8 2018-04-01  15:15:00    0.90
9 2018-04-01  23:59:00    0.95

To create the Dataframe with the data of the "day" period, enter the 'hour' condition between 8 and 20 in the query filter:

df_dia = df.loc[(df['hora'] >= datetime.time(hour=8)) & (df['hora'] <= datetime.time(hour=20))]

Upshot:

In [9]: df_dia
Out[9]:
        data      hora  numero
0 2018-01-01  10:30:00     0.1
3 2018-02-01  11:10:00     0.4
7 2018-03-01  12:20:00     0.8
8 2018-04-01  15:15:00     0.9

To create the Dataframe night, just inform the contrary condition:

df_noite = df.loc[(df['hora'] < datetime.time(hour=8)) | (df['hora'] > datetime.time(hour=20))]

Upshot:

In [12]: df_noite
Out[12]:
        data      hora  numero
1 2018-01-01  20:05:00    0.20
2 2018-01-01  07:00:00    0.30
4 2018-02-01  22:35:00    0.50
5 2018-03-01  03:10:00    0.60
6 2018-03-01  20:45:00    0.70
9 2018-04-01  23:59:00    0.95

Browser other questions tagged

You are not signed in. Login or sign up in order to post.