how to get the minimum average and maximum time of a timeseries with pandas, based on a column where the value is boleano?

Question

how to get the minimum average and maximum time of a timeseries with pandas, based on a column where the value is boleano?

Asked 6 years, 2 months ago

Viewed 292 times

0

I’m analyzing a timeseries returned with pandas ex:

index----valor------timestamp <br>
----0  -------0--------       2019-04-23 16:14:34.142540+00:00<br>
----1  -------0--------       2019-04-23 16:15:34.142540+00:00<br>
----2  -------1--------       2019-04-23 16:16:34.142540+00:00<br>
----3  -------0--------       2019-04-23 16:17:34.142540+00:00<br>
----4  -------1--------       2019-04-23 16:18:34.142540+00:00<br>
----5  -------1--------       2019-04-23 16:19:34.142540+00:00<br>
----6  -------0--------       2019-04-23 16:20:34.142540+00:00<br>
.<br>.<br>.<br>

Based on these data I need to calculate the total time the value was in '0' and '1',

The min would be the shorter time that the value kept at 0 and 1,

The maximum would be the longest the value kept at 0 and 1 .

and the average times...

1 answer

Browser other questions tagged python-3.x pandas timestamp

You are not signed in. Login or sign up in order to post.

by AlexCiuffa • **2,402** points · Answer 1 · 2019-04-24T01:15:47+00:00

From that your column timestamp are dates. If they are not, you can transform them like this: df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S')

One solution is to create another Data Frame df_intervalos, with the columns data_fim, data_inicio, valor and go through the original dataframe line by line, seeing when the sequence changes and filling the df_intervalos.

It can be filled so:

# crio o df_intervalos 
df_intervalos =  pd.DataFrame(data = {'data_inicio':[], 'data_fim':[], 'valor':[]})

data_inicio = df['timestamp'].values[0]
data_fim = df['timestamp'].values[0]
sequencia = df['valor'].values[0]

for index, linha in df[1:].iterrows(): # percorro linha a linha
    if linha['valor'] != sequencia: # Se o valor for diferente, encerra a sequencia
        df_intervalos = df_intervalos.append({'data_inicio' : data_inicio,
                                              'data_fim' : data_fim,
                                              'valor': sequencia} , ignore_index=True)
        data_inicio = linha['timestamp']
        sequencia = linha['valor']

    data_fim = linha['timestamp']

# Adiciona a ultima sequência
df_intervalos = df_intervalos.append({'data_inicio' : data_inicio,
                                              'data_fim' : data_fim,
                                              'valor': sequencia} , ignore_index=True)

Then we take the time of each sequence by subtracting the start and end dates of each sequence:

df_intervalos['tempo_calculado'] = (df_intervalos['data_fim'] - df_intervalos['data_inicio'])

The result is something like:

                    data_fim                   data_inicio   valor   tempo_calculado
0 2019-04-23 16:15:34.142540    2019-04-23 16:14:34.142540    0.0        00:01:00
1 2019-04-23 16:16:34.142540    2019-04-23 16:16:34.142540    1.0        00:00:00
2 2019-04-23 16:17:34.142540    2019-04-23 16:17:34.142540    0.0        00:00:00
3 2019-04-23 16:19:34.142540    2019-04-23 16:18:34.142540    1.0        00:01:00
4 2019-04-23 16:20:34.142540    2019-04-23 16:20:34.142540    0.0        00:00:00

Now, to know the maximum, minimum and average sequence duration 1, for example:

>>> df_intervalos[df_intervalos['valor']==1]['tempo_calculado'].max()
    Timedelta('0 days 00:01:00')

>>> df_intervalos[df_intervalos['valor']==1]['tempo_calculado'].min()
    Timedelta('0 days 00:00:00')

>>> df_intervalos[df_intervalos['valor']==1]['tempo_calculado'].mean()
    Timedelta('0 days 00:00:30')

Or even if you want the whole line:

>>> df_intervalos[df_intervalos['valor']==1][df_intervalos['tempo_calculado'] ==df_intervalos['tempo_calculado'].max()]

                    data_fim                   data_inicio   valor   tempo_calculado
3 2019-04-23 16:19:34.142540    2019-04-23 16:18:34.142540    1.0        00:01:00