Query Performance - Postgresql

Asked

Viewed 599 times

0

I have the following situation, I need to bring data from a vehicle table of the system in which I work and another with integrated data of a partner system, for being integration the data are not very consistent and the way to connect the tables can be by chassis, plate or id:

select COUNT(*) from syo_veiculo = 290518
select COUNT(*) from syo_oficina = 945336

select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi

from syo_veiculo vei
 join syo_oficina ofi ON vei.id_veiculo = ofi.id_produto OR vei.ds_placa = ofi.ds_placa OR vei.no_chassi = ofi.no_chassi

where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000
group by vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, empresaofi

= 879 linhas em 30622 ms, 30337 ms, 30275 ms

Note that the query result was obtained in about 30 seconds, without grouping, would bring 1078 lines at the same time.

Alternative query:

select *
from (
    select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi

    from syo_veiculo vei
     join syo_oficina ofi ON vei.id_veiculo = ofi.id_produto

    where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000

    union all

    select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi

    from syo_veiculo vei
     join syo_oficina ofi ON vei.ds_placa = ofi.ds_placa

    where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000

    union all

    select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi

    from syo_veiculo vei
     join syo_oficina ofi ON vei.no_chassi = ofi.no_chassi

    where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000
) as tbaux
group by id_veiculo, no_os, id_cliente, id_empresa, dt_emissao, empresaofi

= 879 linhas em 1969 ms, 791 ms, 724 ms

Note that the return came much faster, the same 879 lines without the grouping would bring 2172 lines at the same time.

  • I want to understand why the query with the union all where has much more validations takes much less time, does not make sense to me, even more that the result was the same

  • Is there any exception in these cases union all X join com OR, for example an FK in the workshop table could make some difference in favor of the join ?

I want to understand why this is to better plan my next queries and seek the best performance.

  • 1

    you have contents for these three columns in the two banks?

  • @Tobymosque No index between tables.

  • 1

    You can’t expect much when performing a Join between two tables when none of the tables has indices, still think 2s long for 800-line return.

  • @Tobymosque 2 seconds only in the first execution what is common, the database stores information of the query excutada to expedite in later queries, 790 ms is not bad being that the tables have 290 thousand and 940 thousand lines each, independent of the number of rows in return each row of each table was checked.

  • @Tobymosque you think the indices are solely responsible or have some more factor ?

  • 1

    I believe it is only the indices, performing a tablescan in a table with thousands of records is quite costly.

  • @Tobymosque This situation of the indices is out of my control because I can not change the bank of this system, I will do some tests with a test bank and return the result when possible, thanks in advance for the help

  • I understand, I’ve been through a similar situation, in my case I ended up having to use a table in memory

  • Run an EXPLAIN ANALYZE [YOUR QUERY]; on each of your question queries and add the results in the question so I can analyze and help you.

  • @Tobymosque There were already indices in board, tables chassis and vehicle is a primary key, included an index for ofi.id_product and the time of the first query dropped to 111 milliseconds, thanks for the help

  • the index - create index oficina_id_produto_idx on syo_oficina using hash(id_produto); solved the problem.

  • You would need to see the execution plan of the two queries using the EXPLAIN ANALYZE command

  • 1

    Related : http://answall.com/questions/55118/como-aplica-indexes-para-melhorar-a-performance-das-queries

Show 8 more comments

1 answer

0


Try the following:

    WITH tbaux  AS
    ( select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi
        from syo_veiculo vei
        join syo_oficina ofi ON vei.id_veiculo = ofi.id_produto
        where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000

        union all

        select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi

        from syo_veiculo vei
        join syo_oficina ofi ON vei.ds_placa = ofi.ds_placa
        where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000

        union all

        select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi
        from syo_veiculo vei
        join syo_oficina ofi ON vei.no_chassi = ofi.no_chassi

        where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000)
SELECT * FROM tbaux group by id_veiculo, no_os, id_cliente, id_empresa, dt_emissao, empresaofi;

It is interesting also you check the indexes of the tables. You can also run some bank maintenance routines (Vacuum Analysis, REIDEX and CLUSTER).

However, it is worth remembering that Vacuum is recommended to be done at a time when the bank is not being used..

  • I was reviewing these questions I asked in the past, even to put an end to them, the problem of your query is that in the software I use for reporting, it does not allow starting the query with, the problem was even an index that was missing in ds_placa.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.