Can subqueries decrease performance? Myth or truth?

Asked

Viewed 2,619 times

14

Well, I usually just work with frameworks. I work with frameworks MVC and I usually use Orms to query the database.

In my case I use Laravel, but I’ve used other frameworks, and had the opportunity recently to meet the Entity Framework.

Some experienced users regarding the use of the Entity Framework showed me that the ORM of the same usually generates an SQL for query other than the one we are used to (something that someone too serious could say that decreases the performance).

In the same way I have already noted this in the framework Laravel.

What I mean is that in some cases, these frameworks usually generate some queries with Subqueries, instead of using a JOIN.

For example, I’ve seen one query that, to identify if a user has any publication, was being generated in this way:

select * from 
    usuarios 
where 
    (select count(*) from `posts` where `usuarios`.`id` = `posts`.`usuario_id`) >= 1

But I believe that the same consultation could be done with JOIN.

Well, now I explain why I’m talking about frameworks before getting into the subject. It is because I have heard several comments from the mouths of "programmers", who say that Subqueries causes performance problems. But what you can’t understand is why frameworks then they would insist on doing something that was harmful (reducing performance is harmful, I meant in that context).

However, if I see being used in frameworks, soon comes to mind that "no problem at all" use Subqueries. And without any problem, would not understand so why the developers of frameworks, who theoretically would have enough experience to know what they’re doing.

I an issue on Github, where there was someone reporting the performance problem because of the Subqueries generated, I noticed that the library developer asked, "But does your tables contain the indexes set correctly?" which led me to understand that the problem can sometimes be more in the wrong way of using than properly in the subquery.

So, based on all my arguments above, I ask:

  • Subqueries at all times cause performance problems, or this problem may occur depending on a specific context?

  • It is true that setting indexes for a table can improve the performance of a subquery or is that bullshit?

  • The query will show above, in the example, if done otherwise (as for example, using JOIN), would have improved performance?

  • 1

    only a detail that does not answer your question, but helps you improve it. SQL shown is not a valid subquery example. There is no way to do this in SQL, at least in the Dbms I know.

  • Thanks @Cantoni, good observation

  • I was surprised to learn that the Laravel generates a query "nut" of these. If I want to know a user’s posts would do only: SELECT COUNT(POST_ID) FROM POSTS WHERE PUBLISHER_ID = 100 for example. It would count for one field, reducing the amount of bytes per tuple obviously eliminating *. It wouldn’t even need Join because I want to know the amount of posts of active and non-active users, anything just pass an extra attribute in Where and everything solved. Amazing how they complicate what should be so simple!

  • 1

    In relation to frameworks (and some OO projects that use a lot of OO) sometimes, maintaining the standard is not the best performance case. Writing object-oriented to achieve performance is in itself a difficult task. Often, you have an experienced programmer who writes a little performantico code because of design or culture. An important point about frameworks is that they can’t predict all the forms that will be used, but they try to generate code based on patterns of behavior and good design practices, so they can generate low performance code.

3 answers

13


Subqueries always cause performance problems, or this problem may occur depending on a specific context?

"Always" is a very heavy expression, it depends a lot on the case. And then not only the code generated, but it also depends on the database optimizer, the way the execution takes place, and of course, the data set and model that is working, and one more factor I talk about below.

You have to analyze every case, you always have to measure.

Is it true that setting indexes to a table can improve the performance of a Subquery or is that nonsense?

True, indexes can be the salvation of farming if they are defined correctly. This may be the factor that will actually make a difference. They cannot save everything and can bring some minor harm too (upgrades get slower, there is more disk and memory space consumption, etc.). Again, it depends on what you are doing and used supplier.

The query showed above, in the example, if it were done otherwise (as for example using JOIN), it would have improved performance?

This example I can’t say specifically, but in more common examples I can say that it depends on the database and how it would be JOIN, in some it may be that there is some optimization (common good). In others the JOIN is only syntactic sugar for the common expression.

6

Subqueries can decrease performance?

In short, they can. But it’s not always like this.

Subqueries at all times cause performance problems, or this problem may occur depending on a specific context?

No. It all depends on the database query engine and the compiled statistics upon the query effort. Indexes may harm performance if poorly formulated, but the highest performance penalty usually comes from a design poor database schema.

If database engine supports run Subqueries in parallel, the result may even be faster than when running using joins. There’s a lot of questioning even in the O.R.. What adequately responds to this is the study of the query plan. Each database has its own way of obtaining this study.

Is it true that setting indexes to a table can improve the performance of a Subquery or is that nonsense?

It is true, depending on the bank we are talking about. In Oracle, for example, one TABLE SCAN may be faster than an index search.

Again, it is worth applying the index and studying the sentence.

The query shown above, in the example, if it were done otherwise (as for example, using JOIN), would have improved performance?

And again, it depends. On SQL Server, for example, usually JOINS are faster than Subqueries, but this may vary.

Here is an article on how to get your query execution plan in SQL Server.

5

Wallace, this will depend a lot on the Database engine, taking as an example the two queries below:

--Query A
SELECT 
    A.TabelaAID,
    A.Descricao as DescricaoA
    (SELECT Descricao FROM TabelaB B WHERE A.TabelaBID = B.TabelaBID) as DescricaoB,
    (SELECT Descricao FROM TabelaC C WHERE A.TabelaBID = C.TabelaBID) as DescricaoC,
    (SELECT Descricao FROM TabelaD D WHERE A.TabelaBID = D.TabelaBID) as DescricaoD
FROM TabelaA A

--Query B
SELECT 
    A.TabelaAID,
    A.Descricao as DescricaoA
    B.Descricao as DescricaoB,
    C.Descricao as DescricaoC,
    D.Descricao as DescricaoD
FROM TabelaA A
LEFT JOIN TabelaB B ON A.TabelaBID = B.TabelaBID
LEFT JOIN TabelaC C ON A.TabelaBID = C.TabelaBID
LEFT JOIN TabelaD D ON A.TabelaBID = D.TabelaBID

In the SqlServer 2005 to Query A was exponentially slower than the Query B, all this because the engine could not translate the subquery in a join. as a result, if the Table had 1000 records, the Query A would run 3001 queries and the Query B only 1.

In the SqlServer 2014, there are still some differences in performance, but nothing extraordinary, depending on the consultation to subquery will be even faster than the join.

Now as to your example, I believe you would perform better if you use EXISTS(SELECT 1 FROM post WHERE post.usuario_id = usuarios.id) instead of a COUNT with *.

The following article has a great comparison involving a particular type of subquery, which is quite similar to your example:

Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS

  • Toby, your comment about trading Exists for Count is interesting. Recently, I noticed that the Laravel Framework replaced the SELECT of my Example for a similar one to the one you made (used Exists)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.