0
I have the following situation, I need to bring data from a vehicle table of the system in which I work and another with integrated data of a partner system, for being integration the data are not very consistent and the way to connect the tables can be by chassis, plate or id:
select COUNT(*) from syo_veiculo = 290518
select COUNT(*) from syo_oficina = 945336
select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi
from syo_veiculo vei
join syo_oficina ofi ON vei.id_veiculo = ofi.id_produto OR vei.ds_placa = ofi.ds_placa OR vei.no_chassi = ofi.no_chassi
where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000
group by vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, empresaofi
= 879 linhas em 30622 ms, 30337 ms, 30275 ms
Note that the query result was obtained in about 30 seconds, without grouping, would bring 1078 lines at the same time.
Alternative query:
select *
from (
select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi
from syo_veiculo vei
join syo_oficina ofi ON vei.id_veiculo = ofi.id_produto
where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000
union all
select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi
from syo_veiculo vei
join syo_oficina ofi ON vei.ds_placa = ofi.ds_placa
where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000
union all
select vei.id_veiculo, vei.id_cliente, vei.id_empresa, ofi.no_os, ofi.dt_emissao, ofi.id_empresa as empresaofi
from syo_veiculo vei
join syo_oficina ofi ON vei.no_chassi = ofi.no_chassi
where vei.dt_venda BETWEEN 1435100000000 AND 1435900000000
) as tbaux
group by id_veiculo, no_os, id_cliente, id_empresa, dt_emissao, empresaofi
= 879 linhas em 1969 ms, 791 ms, 724 ms
Note that the return came much faster, the same 879 lines without the grouping would bring 2172 lines at the same time.
I want to understand why the query with the
union all
where has much more validations takes much less time, does not make sense to me, even more that the result was the sameIs there any exception in these cases
union all X join com OR
, for example an FK in the workshop table could make some difference in favor of thejoin
?
I want to understand why this is to better plan my next queries and seek the best performance.
you have contents for these three columns in the two banks?
– Tobias Mesquita
@Tobymosque No index between tables.
– Tiago Oliveira de Freitas
You can’t expect much when performing a Join between two tables when none of the tables has indices, still think 2s long for 800-line return.
– Tobias Mesquita
@Tobymosque 2 seconds only in the first execution what is common, the database stores information of the query excutada to expedite in later queries, 790 ms is not bad being that the tables have 290 thousand and 940 thousand lines each, independent of the number of rows in return each row of each table was checked.
– Tiago Oliveira de Freitas
@Tobymosque you think the indices are solely responsible or have some more factor ?
– Tiago Oliveira de Freitas
I believe it is only the indices, performing a tablescan in a table with thousands of records is quite costly.
– Tobias Mesquita
@Tobymosque This situation of the indices is out of my control because I can not change the bank of this system, I will do some tests with a test bank and return the result when possible, thanks in advance for the help
– Tiago Oliveira de Freitas
I understand, I’ve been through a similar situation, in my case I ended up having to use a table in memory
– Tobias Mesquita
Run an EXPLAIN ANALYZE [YOUR QUERY]; on each of your question queries and add the results in the question so I can analyze and help you.
– AlfredBaudisch
@Tobymosque There were already indices in board, tables chassis and vehicle is a primary key, included an index for ofi.id_product and the time of the first query dropped to 111 milliseconds, thanks for the help
– Tiago Oliveira de Freitas
the index -
create index oficina_id_produto_idx on syo_oficina using hash(id_produto);
solved the problem.– Tiago Oliveira de Freitas
You would need to see the execution plan of the two queries using the EXPLAIN ANALYZE command
– user35493
Related : http://answall.com/questions/55118/como-aplica-indexes-para-melhorar-a-performance-das-queries
– Guilherme Lautert