How to improve SQL performance with IN clause?

Question

How to improve SQL performance with IN clause?

Asked 7 years, 7 months ago

Viewed 662 times

8

I have this SQL here:

SELECT id, nome, url FROM categorias WHERE status = 1 AND id_ls IN 
(SELECT id_categoria FROM cliente_categorias) GROUP BY url

What he does is seek only categories that have customers assigned to them.

My table categories has 1,477 records and cliente_categories 23,616:

It works. The problem is that the load is very slow. The query takes about 17 seconds. There is some way I can improve?

How many records are there? Do cliente_categories need a distinct ? Place the diagram of these two tables!

– novic

2017/12/15 at 19:37
You are using index or cache ?

– Valdeir Psr

2017/12/15 at 19:37
I don’t understand why GROUP BY.

– Victor Stafusa

2017/12/15 at 19:44
@Virgilionovic Hi Virgilio! Thanks for contributing, I edited the question with more :)

– Aryana Valcanaia

2017/12/15 at 19:47
Try using EXISTS(select top 1 1 from ... ) instead of IN

– Zorkind

2017/12/15 at 19:50
@Victorstafusa needs to group some results because of an external issue to the project. this database pulls data from an offline database that has some structure problems. But removing the group by in no way changes the load.

– Aryana Valcanaia

2017/12/15 at 19:57
If that GROUP BY do not change the outcome at all, I recommend taking. I see it as something that can inhibit optimizations that the database would do if you try something that is in the answers below.

– Victor Stafusa

2017/12/15 at 19:58
How long was my query? -P

– Zorkind

2017/12/15 at 20:00
What is the difference between the columns id and id_ls on the table categorias? On the table categorias the primary key is the column id, but in the code that posted Join is established with the column id_ls. I missed that.

– José Diz

2017/12/16 at 10:40
1

@Aryanavalcanaia What is the database manager? (mariaDB, SQL Server, Oracle Database etc)

– José Diz

2017/12/16 at 11:03
A customer can be in the same category multiple times?

– Victor Stafusa

2017/12/16 at 13:46

Show 6 more comments

4 answers

7

Try this:

SELECT c.id, c.nome, c.url
FROM categorias c
INNER JOIN cliente_categorias d ON d.id_categoria = c.id_ls
WHERE c.status = 1
GROUP BY c.url

Also try creating an index to make this type of query faster:

ALTER TABLE cliente_categorias ADD INDEX idx_categoria_categorias_cliente (id_categoria);

Hi Victor :) Thanks for contributing. Loading better, but still slow (9 sec). I will edit the question to add more information.

– Aryana Valcanaia

2017/12/15 at 19:44
@Aryanavalcanaia Response edited. See what happens now.

– Victor Stafusa

2017/12/15 at 19:54
Why the Indice in the id_client? wouldn’t be in the id_category?

– Zorkind

2017/12/15 at 19:55
@Zorkind Oops. Indeed, you are correct. It was an oversight. Thank you for warning.

– Victor Stafusa

2017/12/15 at 19:55
@Victorstafusa No problem :-)

– Zorkind

2017/12/15 at 19:56

Browser other questions tagged sql performance

You are not signed in. Login or sign up in order to post.

by Aryana Valcanaia • **427** points · Answer 1 · 2017-12-16T20:34:31+00:00

I managed to solve it with everyone’s help. In fact the creation of indexes improved the load to 4 seconds (still a little slow) but I believe this happens by the amount of records.

Some observations according to what everyone posted there:

The group by is really necessary.
The distinct has changed nothing the shipment.
The exists slowed down even more.

The query remained the same, I only added the indexes in the tables involved.

by novic • **35,673** points · Answer 2 · 2017-12-15T19:50:23+00:00

3

Make the proof use distinct, example:

SELECT id, nome, url FROM categorias WHERE status = 1 AND id_ls IN 
(SELECT distinct id_categoria FROM cliente_categorias) GROUP BY url

Hi Virgilio, thanks for contributing but the charging time does not change. It follows the same.

– Aryana Valcanaia

2017/12/15 at 19:53
@Aryanavalcanaia has the indices created in the table in the id_category fields of the cliente_categories table?

– novic

2017/12/15 at 19:55
@Aryanavalcanaia can also be the GROUP BY url, this url field is of what size, has Indice in it! , it is necessary?

– novic

2017/12/15 at 19:58

by Zorkind • **361** points · Answer 3 · 2017-12-15T19:52:23+00:00

Try with EXISTS

SELECT id, nome, url FROM categorias WHERE status = 1 
AND EXISTS(select top 1 1 FROM cliente_categorias where id_categoria = id_ls) GROUP BY url