Group closer values in Postgresql

Asked

Viewed 82 times

1

Good morning, you guys. I have a question regarding a grouping of data in postgresql and I would like to know if you can help me.

I have a database containing a table with entry times in an establishment, and another table with exit times. Both have in common a ticket number, which would be my id. I need to group the values so that I can match the nearest entry and exit times, according to your ticket. Isolating a specific ticket, I got the following data:

entrada              ticket  
-------------------  --------  
2020-12-02 07:53:02       802  
2020-12-03 09:00:07       802  

saida                ticket  
-------------------  --------  
2020-12-02 16:42:12       802 
2020-12-03 17:22:21       802  

To try to join this data, I have run the following SQL command:

select e.data as data_entrada, e.ticket as num_ticket, s.data as data_saida from entrada e
join saida s on e.ticket = s.ticket
group by data_entrada, data_saida, num_ticket
having e.ticket = '802' 
order by data_entrada, data_saida

Result obtained:

    data_entrada         num_ticket  data_saida
--  -------------------  --------    -------------------
 0  2020-12-02 07:53:02       802    2020-12-02 16:42:12
 1  2020-12-02 07:53:02       802    2020-12-03 17:22:21
 2  2020-12-03 09:00:07       802    2020-12-02 16:42:12
 3  2020-12-03 09:00:07       802    2020-12-03 17:22:21

Expected result:

    data_entrada         num_ticket  data_saida
--  -------------------  --------    -------------------
 0  2020-12-02 07:53:02       802    2020-12-02 16:42:12
 3  2020-12-03 09:00:07       802    2020-12-03 17:22:21

Would anyone have a solution proposal, either by SQL or even by Python? Apparently it’s simple, but I got lost in the query and I couldn’t find a solution yet. Thanks from now.

  • One possibility is to subselect the lowest exit date that is greater than or equal to the ticket entry date.

  • I believe it’s enough to do the on e.ticket = s.ticket and e.data_entrada = s.data_entrada

  • The same num_ticket may have more than one data_entrada or data_saida same day? If so, what is the expected result?

  • @Terry yes, you can. In this case you would do the combination with the nearest ordered record. But having a solution for this sample I can try to derive the rest of the solution later.

2 answers

1

I believe that what you seek is possible using the command .merge_asof() in pandas. It allows it to be possible to join Dataframes by approximation in the data (in your case, by approximate dates).
To documentation of the function is quite complete and full of examples, but if you have any questions, you can use the comment field and if possible I will complement the answer.

#importando o pandas
import pandas as pd

#transformando as colunas para o tipo datetime
entrada['entrada'] = pd.to_datetime(entrada['entrada'])
saida['saida'] = pd.to_datetime(saida['saida'])

#aplicando o merge por aproximação pelas datas dentro de cada 'ticket'
df = pd.merge_asof(entrada, saida, 
          left_on = 'entrada', 
          right_on = 'saida',  
          by='ticket', 
          direction = 'forward')

df.head()
#saida
    entrada             ticket  saida
0   2020-12-02 07:53:02 802     2020-12-02 16:42:12
1   2020-12-03 09:00:07 802     2020-12-03 17:22:21
  • Perfect, it worked correctly. I didn’t know merge_asof, just the same standard merge. Thank you very much.

0

With the use of a subselect:

SELECT e.data as data_entrada, 
       e.ticket as num_ticket, 
       (SELECT min(s.data) FROM saida s WHERE s.ticket = e.ticket AND s.data > e.data) as data_saida 
FROM entrada e 
WHERE e.ticket = '802'
ORDER BY data_entrada;
  • Using query also worked. Another alternative noted here. Thanks a lot, thanks a lot.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.