48
Could someone help me understand why Postgresql missed several cost estimates in an experiment I did.
I’m doing an experiment with 22 darlings of Benchmark TPCH[1], to verify index performance in queries.
From 22 consultations, only 5 consultations were verified that the optimizer used secondary indexes. In another experiment, the 5 queries cited in a database without the presence of indexes were executed, aiming to identify if the execution time would increase due to the absence of indexes.
But to my surprise, the experiment without the presence of indexes in the database was faster than using the indexes (for the 22 queries) .
I would like to understand why the total cost parameter was wrong in all cases, that is to say, all queries that spent the most time, indicated a lower cost - in all 5 cases, that is incorrect in my opinion.
See below, that the first line refers to query 6 you used
index, the cost was 3335809, lower than the cost 5255959, of the same
query
6 that did not use índice
.
Also look at the time spent. A query
that used index spent 7 minutes while without the use of indexes took 55 seconds. This behavior extended to other cases.
My question is: Why the total cost (execution plan
) incorrectly calculates the cost, for all cases where there are indexes?
Índices |Query |Tempo_Gasto |Custo_Total
===============================================
Sim 6 00:07:56 3335809.61
Nao 6 00:00:55 5255959.00
Sim 7 00:09:16 5847359.97
Nao 7 00:02:08 6793148.45
Sim 10 00:07:04 40743017.17
Nao 10 00:02:14 41341406.62
Sim 15 00:10:03 6431359.90
Nao 15 00:01:56 9608659.87
Sim 20 00:12:48 8412159.69
Nao 20 00:05:49 9537835.93
=============================================
For reasons of use of a Patch
specific to my research project, I am using an old version 9.0.1 (year 2012).
I didn’t modify the standard parameters, only the random_page_cost
to 1, because I am using a disc SSD, where the random access cost is less than in a HDD. Follow what’s in mine postgresql.conf
max_connections = 100
effective_io_concurrency = 5
#seq_page_cost = 1.0
random_page_cost = 1.0
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.005
#cpu_operator_cost = 0.0025
#effective_cache_size = 128MB
Did any of these parameters interfere to make this error in the cost statistics?
Follow query 6 and its Explain Analyse with and without indexes.
------------- QUERY 6 ---------------------------------------------
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1995-01-01'
and l_shipdate < date '1995-01-01' + interval '1' year
and l_discount between 0.09 - 0.01 and 0.09 + 0.01
and l_quantity < 24;
---- COM ÍNDICE (idx_l_shipdatelineitem000)-------------------
Plano Execucao:Aggregate (cost=3335809.59..3335809.61 rows=1
width=16) (actual time=476033.847..476033.847 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=376416.20..3330284.29
rows=2210122 width=16) (actual time=375293.183..471695.391
rows=2282333 loops=1)
Recheck Cond: ((l_shipdate >= _1995-01-01_::date) AND
(l_shipdate < _1996-01-01 00:00:00_::timestamp without time zone))
Filter: ((l_discount >= 0.08) AND (l_discount <= 0.10) AND
(l_quantity < 24::numeric))
-> Bitmap Index Scan on idx_l_shipdatelineitem000
(cost=0.00..375863.67 rows=17925026 width=0) (actual
time=375289.456..375289.456 rows=18211743 loops=1)
Index Cond: ((l_shipdate >= _1995-01-01_::date) AND
(l_shipdate < _1996-01-01 00:00:00_::timestamp without time
zone))Total runtime: 476034.306 ms
------------------ SEM USO DE ÍNDICE ----------------------------
Plano Execucao:Aggregate (cost=5255958.99..5255959.00 rows=1
width=16) (actual time=55051.051..55051.051 rows=1 loops=1)
-> Seq Scan on lineitem (cost=0.00..5250433.68 rows=2210122
width=16) (actual time=0.394..52236.276 rows=2282333 loops=1)
Filter: ((l_shipdate >= _1995-01-01_::date) AND (l_shipdate <
_1996-01-01 00:00:00_::timestamp without time zone)
AND (l_discount >= 0.08) AND (l_discount <= 0.10) AND
(l_quantity < 24::numeric))Total runtime: 55051.380 ms
Any hint is welcome!!
What is the structure of the index? By explain, apparently the index is applied only to the column
l_shipdate
, but its filter conditions depend on access to record values. There is difference in times if a composite index is created with the fields[l_shipdate, l_quantity, l_discount]
?– Bruno Coimbra
Who index type are you using to index the fields ? All fields that are part of your
WHERE
have indexes ? What is the volume of data contained in your tables ? How about a practical example in Sqlfiddle ?– Lacobus
For us to answer this question it will be necessary to pass the data contained in the tables. postgre makes the planning based on estimates of the data (including that is why you need to update these estimates periodically, to always relay the current state of the database).
– Gabriel