How to improve SQL performance with IN clause?

Asked

Viewed 662 times

8

I have this SQL here:

SELECT id, nome, url FROM categorias WHERE status = 1 AND id_ls IN 
(SELECT id_categoria FROM cliente_categorias) GROUP BY url

What he does is seek only categories that have customers assigned to them.

My table categories has 1,477 records and cliente_categories 23,616:

inserir a descrição da imagem aqui

It works. The problem is that the load is very slow. The query takes about 17 seconds. There is some way I can improve?

  • How many records are there? Do cliente_categories need a distinct ? Place the diagram of these two tables!

  • You are using index or cache ?

  • I don’t understand why GROUP BY.

  • @Virgilionovic Hi Virgilio! Thanks for contributing, I edited the question with more :)

  • Try using EXISTS(select top 1 1 from ... ) instead of IN

  • @Victorstafusa needs to group some results because of an external issue to the project. this database pulls data from an offline database that has some structure problems. But removing the group by in no way changes the load.

  • If that GROUP BY do not change the outcome at all, I recommend taking. I see it as something that can inhibit optimizations that the database would do if you try something that is in the answers below.

  • How long was my query? -P

  • What is the difference between the columns id and id_ls on the table categorias? On the table categorias the primary key is the column id, but in the code that posted Join is established with the column id_ls. I missed that.

  • 1

    @Aryanavalcanaia What is the database manager? (mariaDB, SQL Server, Oracle Database etc)

  • A customer can be in the same category multiple times?

Show 6 more comments

4 answers

7


Try this:

SELECT c.id, c.nome, c.url
FROM categorias c
INNER JOIN cliente_categorias d ON d.id_categoria = c.id_ls
WHERE c.status = 1
GROUP BY c.url

Also try creating an index to make this type of query faster:

ALTER TABLE cliente_categorias ADD INDEX idx_categoria_categorias_cliente (id_categoria);
  • Hi Victor :) Thanks for contributing. Loading better, but still slow (9 sec). I will edit the question to add more information.

  • @Aryanavalcanaia Response edited. See what happens now.

  • Why the Indice in the id_client? wouldn’t be in the id_category?

  • @Zorkind Oops. Indeed, you are correct. It was an oversight. Thank you for warning.

  • @Victorstafusa No problem :-)

3

I managed to solve it with everyone’s help. In fact the creation of indexes improved the load to 4 seconds (still a little slow) but I believe this happens by the amount of records.

Some observations according to what everyone posted there:

  • The group by is really necessary.
  • The distinct has changed nothing the shipment.
  • The exists slowed down even more.

The query remained the same, I only added the indexes in the tables involved.

  • It may be that you have slower Inserts, but I don’t think it should impact much, I’m glad you solved, I believe the answer that deserves to be taken for granted is Victor’s :-)

3

Make the proof use distinct, example:

SELECT id, nome, url FROM categorias WHERE status = 1 AND id_ls IN 
(SELECT distinct id_categoria FROM cliente_categorias) GROUP BY url
  • Hi Virgilio, thanks for contributing but the charging time does not change. It follows the same.

  • @Aryanavalcanaia has the indices created in the table in the id_category fields of the cliente_categories table?

  • @Aryanavalcanaia can also be the GROUP BY url, this url field is of what size, has Indice in it! , it is necessary?

3

Try with EXISTS

SELECT id, nome, url FROM categorias WHERE status = 1 
AND EXISTS(select top 1 1 FROM cliente_categorias where id_categoria = id_ls) GROUP BY url

Browser other questions tagged

You are not signed in. Login or sign up in order to post.