First a scenario similar to what you described was created: identical structure and a random data mass:
--
-- MODELO DA TABELA app_event
--
create table app_event
(
ae_id serial primary key,
ae_driver_id integer,
ae_trip_item_id integer,
ae_date timestamp not null
);
--
-- MODELO DA TABELA trip_item
--
create table trip_item
(
ti_id serial primary key
);
--
-- INSERE 3 MILHOES DE REGISTROS NA TABELA app_event
--
INSERT INTO app_event( ae_id, ae_date, ae_driver_id, ae_trip_item_id )
SELECT
generate_series( 1, 3000000 ),
now() - ((random() * 3650) * '1 day'::interval),
(random() * 100)::int,
(random() * 100)::int;
--
-- INSERE 1 MILHAO DE REGISTROS NA TABELA trip_item
--
INSERT INTO trip_item( ti_id )
SELECT
generate_series( 1, 1000000 );
Query under analysis:
SELECT
ae_id
FROM
app_event
JOIN
trip_item ON (ti_id = ae_trip_item_id)
WHERE
ae_driver_id = 63
ORDER BY
ae_date DESC
LIMIT
10;
Your query plan looks something like this:
Limit (cost=87050.55..87050.58 rows=10 width=12)
-> Sort (cost=87050.55..87089.49 rows=15574 width=12)
Sort Key: app_event.ae_date
-> Hash Join (cost=28320.00..86714.00 rows=15574 width=12)
Hash Cond: (app_event.ae_trip_item_id = trip_item.ti_id)
-> Seq Scan on app_event (cost=0.00..58043.59 rows=15574 width=16)
Filter: (ae_driver_id = 63)
-> Hash (cost=15045.00..15045.00 rows=1062000 width=4)
-> Seq Scan on trip_item (cost=0.00..15045.00 rows=1062000 width=4)
Note that the plane analyzer points two points of attention:
1) A Sequential Scan
with a cost
quite high in the table app_event
because of a field filtration ae_driver_id
;
2) Other Sequential Scan
with a cost
quite high in the table trip_item
because of Join’s condition using the field ae_trip_item_id
.
A sequential search on a table with a large volume of data tends to be slow during read operations. This is why index creation is crucial for good performance.
Creating Indexes:
CREATE INDEX idx_ae_trip_item_id ON app_event (ae_trip_item_id);
CREATE INDEX idx_ae_driver_id ON app_event (ae_driver_id);
New query plan:
Limit (cost=24050.07..24050.09 rows=10 width=12)
-> Sort (cost=24050.07..24120.51 rows=28178 width=12)
Sort Key: app_event.ae_date
-> Merge Join (cost=23021.55..23441.15 rows=28178 width=12)
Merge Cond: (trip_item.ti_id = app_event.ae_trip_item_id)
-> Index Only Scan using trip_item_pkey on trip_item (cost=0.42..30408.42 rows=1000000 width=4)
-> Sort (cost=23014.70..23085.14 rows=28178 width=16)
Sort Key: app_event.ae_trip_item_id
-> Bitmap Heap Scan on app_event (cost=530.81..20932.02 rows=28178 width=16)
Recheck Cond: (ae_driver_id = 63)
-> Bitmap Index Scan on idx_ae_driver_id (cost=0.00..523.76 rows=28178 width=0)
Index Cond: (ae_driver_id = 63)
The effect of index creation is the total absence of Sequential Scans
in the query plan and a drastic reduction in the cost
of operations.
In practice, in my tests, before the creation of the indexes, the query took 450ms
to be executed, after the creation of the indexes the running time is around 12ms
.
I hope I’ve helped!
You’ve been around a year to see the time estimates and why it’s taking so long?
– Matheus E. Mordorst
I think you can add indexes to columns that do Join as well. Select with an explain first.
explain select ae_id from app_event
join trip_item on ti_id = ae_trip_item_id
where ae_driver_id = 63 order by ae_date desc limit 10;
and see at which stage the time is longer– Rovann Linhalis
@Matheusmordorst did not use the Analyze, but I saw by Explain query that the order by took longer to run.
– Jonathan
@Rovannlinhalis they already have Dexes, they are also PK and FK all of them (except ae_date, then I added an index to it)
– Jonathan
It is normal for an ORDER BY to take so long to be done with this amount of data, to try using query caching, or it is not feasible for you?
– FabianoLothor
has a look at this article: https://imasters.com.br/artigo/4406/postgresql/optimizando-bancos-postgresql-parte-01/? trace=1519021197&source=single and check out the option
work_mem
– Rovann Linhalis
@Fabianolothor I never used the cache. I have several drivers who send
eventos
to the bank every day, then the data from the tables try to increase. And those responsible for these drivers make these inquiries, daily. I don’t know if in this case I could use the cache– Jonathan
Have you considered using this query in a view? if you have permission to do so.
– Sidon