How to calculate the time from point A to B?

Asked

Viewed 1,080 times

2

I need to do a time check that a truck takes from a factory X until a resale Y, in addition to calculating how long he was in this resale.

I have GPS data and removed information that was relevant because it is a lot of data and would be very slow to use it. My table is as follows:

+----+------------------+-----------+---------------+
| ID |   DATA E HORA    |   LOCAL   | ENTRADA/SAIDA |
+----+------------------+-----------+---------------+
|  1 | 28/01/2017 07:27 | FABRICA   | Entrada       |
|  2 | 28/01/2017 09:00 | FABRICA   | Saida         |
|  3 | 29/01/2017 00:02 | REVENDA 1 | Entrada       |
|  4 | 29/01/2017 04:00 | REVENDA 1 | Saida         |
|  5 | 29/01/2017 08:00 | REVENDA 2 | Entrada       |
|  6 | 29/01/2017 10:00 | REVENDA 2 | Saida         |
|  7 | 29/01/2017 20:00 | FABRICA   | Entrada       |
|  8 | 29/01/2017 23:00 | FABRICA   | Saida         |
+----+------------------+-----------+---------------+

I have to take a day/hour break when he was at a resale and make these calculations (know how long he stayed at the resale and how long it took him to get there from the X factory).

Reselling time calculation seems to be easier (Output - Input) The biggest problem is the time from factory to resale. I would like to know how I can do this (can be by own SQL QUERY if possible)

PS: The table template can be changed, I did it this way because I found it easier to use; I am using Sqlite but I can use any BD that is like it

Thank you.

  • The table has a key ?

  • Yes, I already added in the example.

  • See if you can help http://forum.imasters.com.br/topic/555701-somar-diferenca-entra-varias-datas-e-hours/#entry2216634

2 answers

6


Observation 1: Your question has the tag , then I will respond with a hint of how to do this using Python. But if your data is maintained in an SQL database, it is certainly much better you already bring the filtered results (that is, only the sub-table with the course) or, better yet, already calculate it directly there (with a stored Procedure, for example). If not knows how to assemble a query thus, open another question and turned to SQL.

Observation 2: To facilitate I put your data in a local CSV file, and read them with the function read_csv of the Pandas. No your case, just switch to the function read_sql. This is the file CSV used:

ID,DATA E HORA,LOCAL,ENTRADA/SAIDA
1,28/01/2017 07:27,FABRICA,Entrada
2,28/01/2017 09:00,FABRICA,Saida
3,29/01/2017 00:02,REVENDA 1,Entrada
4,29/01/2017 04:00,REVENDA 1,Saida
5,29/01/2017 08:00,REVENDA 2,Entrada
6,29/01/2017 10:00,REVENDA 2,Saida
7,29/01/2017 20:00,FABRICA,Entrada
8,29/01/2017 23:00,FABRICA,Saida

One way to get what you want quite easily is to use the library Pandas. It’s a fantastic library for data manipulation and analysis. It has a considerable learning curve, but it’s worth it. Note that it depends on another very useful library, called Numpy.

Using this library, I have prepared a very simple example that does the following:

  1. It reads your complete data (from a CSV in my case, but it will be from the SQL in yours) in a table (which in Pandas is a DataFrame).
  2. Filter the table rows to get the route (leaving the factory and arriving at the desired resale).
  3. Calculates the time elapsed on this route by simply subtracting the time of departure (value of the "DATE AND TIME" column of the last row of the filtered sub-table) from the time of arrival (value of the "DATE AND TIME" column of the first row of the filtered sub-table).
  4. Repeats steps 2 and 3, but this time for the stay (the time that stood still at the resale, considering the arrival and exit from there).

Note that there are several simplifications in this code, as it is only illustrative. For example, it assumes that there are only two resale records (one input and one output). In more complex cases, you’ll have to work with filters to make sure you get the records in the right order (use tabela.index to obtain the indexes and tabela.iloc to access an index line).

Finally, the result is given in a Pandas structure used to manipulate elapsed time (called Timedelta). It stores the elapsed time as the sum of days + seconds. You have the seconds, so to have the partial times in hours or minutes just do the divisions by 3600 or 60 appropriately.

Here’s the code:

import sys
import pandas as pd
import numpy as np

# ---------------------------------------------------------------------------
def calculaTempos(tabela, nomeRevenda):

    ######################################
    # Cálculo do tempo de percurso
    ######################################

    # Debug!!!!!!
    print('Tabela completa com todos os todos os dados: \n{}\n\n'.format(tabela))
    # Debug!!!!!!

    # Busca TODAS as ocorrências de FABRICA + Saida
    filtro = (tabela['LOCAL'] == 'FABRICA') & (tabela['ENTRADA/SAIDA'] == 'Saida')
    resp = tabela[filtro]
    if resp.empty:
        return None, None

    # Por simplificação, considera como partida a primeira delas
    partida = resp.index[0]

    # Busca TODAS as ocorrências de nomeRevenda + Entrada
    filtro = (tabela['LOCAL'] == nomeRevenda) & (tabela['ENTRADA/SAIDA'] == 'Entrada')
    resp = tabela[filtro]
    if resp.empty:
        return None, None

    # Por simplificação, considera como chegada a última delas
    chegada = resp.index[len(resp)-1]

    # Filtra da tabela original os dados entre a partida e a chegada calculados
    tTrajeto = tabela[partida:chegada+1]

    # Debug!!!!!!
    print('Tabela parcial com o trajeto Fabrica->Revenda: \n{}\n\n'.format(tTrajeto))
    # Debug!!!!!!

    # O tempo de percurso é o horário de chegada (a última linha da tabela)
    # menos o horário de partida (a primeira linha da tabela)
    tempoPercurso = tTrajeto.iloc[len(tTrajeto)-1]['DATA E HORA'] - tTrajeto.iloc[0]['DATA E HORA']

    ######################################
    # Cálculo do tempo de parada
    ######################################

    # Busca TODAS as ocorrências de nomeRevenda + Saida
    filtro = (tabela['LOCAL'] == nomeRevenda) & (tabela['ENTRADA/SAIDA'] == 'Saida')
    resp = tabela[filtro]
    if resp.empty:
        return tempoPercurso, None

    # Por simplificação, considera como saída a última delas
    saida = resp.index[len(resp)-1]

    # Filtra da tabela original os dados entre a chegada e a saída calculados
    tParada = tabela[chegada:saida+1]

    # Debug!!!!!!
    print('Tabela parcial com a parada na Revenda: \n{}\n\n'.format(tParada))
    # Debug!!!!!!

    # O tempo de percurso é o horário de chegada (a última linha da tabela)
    # menos o horário de partida (a primeira linha da tabela)
    tempoParada = tParada.iloc[len(tParada)-1]['DATA E HORA'] - tParada.iloc[0]['DATA E HORA']

    return tempoPercurso, tempoParada

# ---------------------------------------------------------------------------
def main(args):
    tabela = pd.read_csv('teste.csv')
    tabela['DATA E HORA'] = pd.to_datetime(tabela['DATA E HORA'])

    tempoPercurso, tempoParada = calculaTempos(tabela, 'REVENDA 1')

    # A estrutura Timedelta do pandas armazena tempo decorrido em dias + segundos,
    # então precisa fazer uma conversãozinha pra ter os valores separadamente.
    dias, segundos = tempoPercurso.days, tempoPercurso.seconds
    horas = segundos // 3600
    minutos = (segundos % 3600) // 60
    segundos = segundos % 60
    print('Tempo de percurso entre fábrica e revenda: {:02d} dias e {:02d}:{:02d}:{:02d} horas'.format(dias, horas, minutos, segundos))

    dias, segundos = tempoParada.days, tempoParada.seconds
    horas = segundos // 3600
    minutos = (segundos % 3600) // 60
    segundos = segundos % 60
    print('Tempo parado na revenda: {:02d} dias e {:02d}:{:02d}:{:02d} horas'.format(dias, horas, minutos, segundos))

    return 0

# ---------------------------------------------------------------------------
if __name__ == '__main__':
    sys.exit(main(sys.argv[1:]))

And his way out:

Tabela completa com todos os todos os dados:
   ID         DATA E HORA      LOCAL ENTRADA/SAIDA
0   1 2017-01-28 07:27:00    FABRICA       Entrada
1   2 2017-01-28 09:00:00    FABRICA         Saida
2   3 2017-01-29 00:02:00  REVENDA 1       Entrada
3   4 2017-01-29 04:00:00  REVENDA 1         Saida
4   5 2017-01-29 08:00:00  REVENDA 2       Entrada
5   6 2017-01-29 10:00:00  REVENDA 2         Saida
6   7 2017-01-29 20:00:00    FABRICA       Entrada
7   8 2017-01-29 23:00:00    FABRICA         Saida


Tabela parcial com o trajeto Fabrica->Revenda:
   ID         DATA E HORA      LOCAL ENTRADA/SAIDA
1   2 2017-01-28 09:00:00    FABRICA         Saida
2   3 2017-01-29 00:02:00  REVENDA 1       Entrada


Tabela parcial com a parada na Revenda:
   ID         DATA E HORA      LOCAL ENTRADA/SAIDA
2   3 2017-01-29 00:02:00  REVENDA 1       Entrada
3   4 2017-01-29 04:00:00  REVENDA 1         Saida


Tempo de percurso entre fábrica e revenda: 00 dias e 15:02:00 horas
Tempo parado na revenda: 00 dias e 03:58:00 horas
  • Thanks Luiz! In a very large amount of this data, use functions of SQL itself I gain performance?

  • Not at all. He almost certainly does. First of all because, as good as Pandas is, the database is already prepared to do this kind of query in tables. Second because you don’t need convey between the database and the process the data of the entire journey. You basically only transmit the result. This makes a lot of difference if the seat is not on the same machine and if the path is long. Third, this type of "logic" is usually independent of the use that will be made of the result, so from a modeling point of view it can make a lot of sense to stay close to the data.

  • continuing... After all, doing in the database you can reuse the same logic in different applications, built in different languages. Good luck! :)

  • Thanks again, could you give me the "path of stones" on how to do? I know very little about SQL, some article or some function I can search to learn more please

  • I don’t know much about SQL because I haven’t touched a relational database for a long time. As I said in my first observation, your question was opened to Python. Maybe that’s why it didn’t attract the attention of the people who know SQL. Another thing is: what exactly is your difficulty with SQL? Is it the query that brings the path, or is it the subtraction of dates? I suggest you open another question, specific to SQL, asking for help. There are a lot of people around here who can help you, but the question has got to be one big one.

0

With Mysql you can use the TIMESTAMPDIFF function

With this command SELECT below you can know the amount of minutes between the two points. To select the ID of each point, in its interface you place a filter for the user to select the Location and the Entry/Exit Point. Each option you put for the user will have an associated ID that you will use from input for consultation SQL.

In the example, I am admitting as point of origin the exit from the factory and as destination the entry point in the resale 2.

SELECT TIMESTAMPDIFF ( 
    MINUTE,
    (select DATA from Tablela WHERE ID=2),
    (select DATA from Tablela WHERE ID=5)
    )

Upshot

1380 minutos

You can change the parameter MINUTE for HOUR, for example, but with minutes will have a greater precision that can convert to decimal hours if you want.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.