SELECT MAX With more than three columns

Asked

Viewed 201 times

2

I have the following table:

jobname     aplicacao   start           end             tempo_de_execucao
PROCESSO_C  APLICACAO_C 3/4/2020 3:01   3/4/2020 5:39   2:37:26             
PROCESSO_E  APLICACAO_D 3/4/2020 6:00   3/4/2020 6:41   0:40:51             
PROCESSO_D  APLICACAO_B 3/4/2020 15:29  3/4/2020 17:25  1:56:18             
PROCESSO_C  APLICACAO_C 3/5/2020 3:01   3/5/2020 5:40   2:38:51             
PROCESSO_E  APLICACAO_D 3/5/2020 6:00   3/5/2020 6:42   0:42:15             
PROCESSO_D  APLICACAO_B 3/5/2020 12:27  3/5/2020 12:28  0:01:37             
PROCESSO_B  APLICACAO_B 3/5/2020 22:59  3/6/2020 18:56  19:57:05            
PROCESSO_C  APLICACAO_C 3/6/2020 3:01   3/6/2020 5:52   2:50:54             
PROCESSO_E  APLICACAO_D 3/6/2020 6:00   3/6/2020 6:40   0:39:51             
PROCESSO_D  APLICACAO_B 3/6/2020 16:24  3/6/2020 16:26  0:01:36             
PROCESSO_B  APLICACAO_B 3/6/2020 23:01  3/7/2020 13:27  14:25:07            
PROCESSO_C  APLICACAO_C 3/7/2020 3:01   3/7/2020 6:45   3:43:53             
PROCESSO_A  APLICACAO_A 3/7/2020 5:00   3/8/2020 5:04   24:04:14    

I am trying to execute a query to bring me the 05 most time-consuming processes, without repeating the name, that is I would need this result:

jobname     aplicacao   start           end             tempo_de_execucao
PROCESSO_A  APLICACAO_A 3/7/2020 5:00   3/8/2020 5:04   24:04:14            
PROCESSO_B  APLICACAO_B 3/5/2020 22:59  3/6/2020 18:56  19:57:05            
PROCESSO_C  APLICACAO_C 3/7/2020 3:01   3/7/2020 6:45   3:43:53             
PROCESSO_D  APLICACAO_B 3/4/2020 15:29  3/4/2020 17:25  1:56:18             
PROCESSO_E  APLICACAO_D 3/5/2020 6:00   3/5/2020 6:42   0:42:15             

Only when I run the query:

select A.job_mem_name AS JOBNAME,
A.application AS APLICACAO,
max (A.start_time) AS START,
max (A.end_time) AS END,
max (A.run_time_sec::real /60 /60 / 24) AS TEMPO_DE_EXECUCAO
from runinfo_history A
where A.ended_status = '16' and A.job_id IS NOT NULL and A.job_mem_name in 
(SELECT B.job_name
from def_ver_job B
where B.memname <> 'AFT')
group by A.job_mem_name, A.application
order by max (A.run_time_sec) desc limit 05;

brings me the following value, start time is displayed the last execution date and not the actual date.

jobname     aplicacao   start           end             tempo_de_execucao
PROCESSO_A  APLICACAO_A 3/7/2020 5:00   3/8/2020 5:04   24:04:14
PROCESSO_B  APLICACAO_B 3/6/2020 23:01  3/7/2020 13:27  19:57:05
PROCESSO_C  APLICACAO_C 3/7/2020 3:01   3/7/2020 6:45   3:43:53
PROCESSO_D  APLICACAO_B 3/6/2020 16:24  3/6/2020 16:26  1:56:18
PROCESSO_E  APLICACAO_D 3/6/2020 6:00   3/6/2020 6:40   0:42:15

If I don’t put the expression max on (A.start_time and A.end_time), comes the message:

ERROR:  column "a.start_time" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3: A.start_time AS START,

If I put in the group query by the fields A.start_time and A.end_time the result is this, or repeats the name of the application.

jobname     aplicacao   start           end             tempo_de_execucao   
PROCESSO_A  APLICACAO_A 3/7/2020 5:00   3/8/2020 5:04   24:04:14            
PROCESSO_B  APLICACAO_B 3/5/2020 22:59  3/6/2020 18:56  19:57:05            
PROCESSO_B  APLICACAO_B 3/6/2020 23:01  3/7/2020 13:27  14:25:07            
PROCESSO_C  APLICACAO_C 3/7/2020 3:01   3/7/2020 6:45   3:43:53             
PROCESSO_C  APLICACAO_C 3/6/2020 3:01   3/6/2020 5:52   2:50:54             

Remembering that the result I needed would be:

  • Assess the use of the clause DISTINCT ON (expr, ...) preceding the selection list (not simple DISTINCT). https://www.postgresql.org/docs/current/queries-select-lists.html#QUERIES-DISTINCT

  • 1 get the longest run time per jobname with a subselect with max , 2 with this sql get the biggest X’s

  • The problem is that my table contains more than 19,000 records, and if I put DISTINCT ON, I cannot put the expression: order by max (A.run_time_sec) desc limit 10; as it comes the message: ERROR: SELECT DISTINCT ON Expressions must match initial ORDER BY Expressions LINE 1: SELECT DISTINCT ON (A.job_mem_name)

  • Try to sort just by A.jobname, A.run_time_sec, do not use max function.

1 answer

3

Use a Window Function

Use partitioning by jobname, and the decreasing order of duration:

select
    t.*,
    rank() over w as i
from t
WINDOW w AS (PARTITION BY jobname ORDER BY duracao DESC)

Upshot:

Note the last column, i which is the rank of each record:

inserir a descrição da imagem aqui

Then you select by filtering only the rank records = 1:

SELECT x.jobname,
       x.aplicacao,
       x.start,
       x."end",
       x.duracao
FROM
  (SELECT t.*,
          rank() OVER w AS i
   FROM t WINDOW w AS (PARTITION BY jobname
                       ORDER BY duracao DESC)) x
WHERE x.i = 1

Upshot:

inserir a descrição da imagem aqui

http://sqlfiddle.com/#! 15/1bf94/3

  • Rovann, thank you very much worked out using partitioning. My query looks like this: select job_mem_name AS JOBNAME, application AS APPLICATION, start_time AS STAR, end_time AS END, run_time_sec::real /60 /60 / 24 AS TEMPO_DEXECAO from (select job_mem_name,&#xA;application, start_time, end_time,run_time_sec,&#xA;rank() over w as i&#xA;from runinfo_history&#xA;where runinfo_history.start_time between '2020-04-03 00:00:01' and '2020-04-10 23:00:01'&#xA;WINDOW w AS (PARTITION BY job_mem_name ORDER BY run_time_sec DESC)) A Where A.i = 1

  • Only thing I could not do was to use a second table in the first Select, ie it would be the average running time that is in the table avg_run_info, something like this:

  • select&#xA;job_mem_name AS JOBNAME,&#xA;application AS APLICACAO,&#xA;start_time AS STAR,&#xA;end_time AS END,&#xA;avg_run_info.avg_run_time::real /60 /60 / 24,&#xA;run_time_sec::real /60 /60 / 24 AS TEMPO_DE_EXECUCAO&#xA;from&#xA;(select job_mem_name,&#xA;application, start_time, end_time,avg_run_info.avg_run_time,run_time_sec,&#xA;rank() over w as i&#xA;from runinfo_history&#xA;inner join avg_run_info on A.job_mem_name = avg_run_info.job_mem_name&#xA;where runinfo_history.start_time between '2020-04-03 00:00:01' and '2020-04-10 23:00:01' WINDOW w AS (PARTITION BY job_mem_name ORDER BY run_time_sec DESC)) A Where A.i = 1

  • Hello @Tulionorbertopadilharomanin, hard to see by the comments, but make Join out of sub-select. If the answer solves your question, please mark it as the answer. If necessary, change SQL Fiddle and send the link we try to continue to help you.

  • @Rovannn Linhalis, I got, doing as you said, thank you so much for your help

  • @Tulionorbertopadilharomanin do not forget to mark the answer. I saw that you are new in the community, take a look at the contents of how the community works... have the [tour] and https://pt.meta.stackoverflow.com/questions/8045/guia-de-survivor%C3%aancia-do-stack-overflow-em-portugu%C3%aas? cb=1

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.