SQL Server JOINS vs Subquerys

Asked

Viewed 94 times

7

I was making some queries, and came the need to carry out the grouping of a table that was in a JOIN within the query. To illustrate.

Sales chart.

+----+---------+------------+----------+
| Id | Cliente |    Data    | Vendedor |
+----+---------+------------+----------+
|  1 |     123 | 2018-03-20 |       12 |
|  2 |     456 | 2018-03-20 |       34 |
+----+---------+------------+----------+

Key (Id)
Items of sales:

+-------+---------+------------+----------------+
| Venda | Produto | Quantidade | Valor_Unitario |
+-------+---------+------------+----------------+
|     1 |     123 |          3 |           5,50 |
|     1 |     456 |          9 |              5 |
|     2 |     789 |          5 |            7,0 |
|     2 |     101 |          7 |            7,0 |
+-------+---------+------------+----------------+

Key (Sale, Product)

The query I was writing was a simple JOIN.

SELECT * FROM vendas v INNER JOIN vendas_itens vi ON vi.Venda = v.Id

However, at one point, I wanted to know how much each sale is worth and the order information. Determined to possess such information, I wrote the two queries below.

SELECT V.*, 
SUM(Quantidade * Valor_Unitario) AS [Total]
FROM vendas v 
LEFT JOIN vendas_itens vi 
ON vi.Venda = v.Id 
GROUP BY V.Id, 
         V.Cliente, 
         V.Data, 
         V.Vendedor

And

SELECT V.*,
       (SELECT SUM(Quantidade * Valor_Unitario) 
        FROM vendas_itens vi 
        WHERE vi.Venda = v.Id ) AS [Total]
FROM vendas v 

I checked the implementation plan for both consultations, but there was only one difference.

Plano de execução

I would like to know the following:

-What is the difference in performance between the two consultations? Is there an advantage between one and the other (apart from the fact that the second one writes less)? - (I’m still junior)Which one is more "professional"? Or was it just a matter of taste?

I did some tests on a table with 15k records and saw no difference in performance.

Any improvement on the question, just comment, please.

EDIT1: As well remembered by José, the first query should be LEFT, because the subquery will not limit the scope of the first table. And as he asked, there are no indexes in the tables, only the keys.

  • 3

    Congratulations for worrying about analyzing the execution plan of each query. // The similarity in the two execution plans is due to the action of the SQL Server query optimizer. // To compare the 2 codes, I suggest you change the first code to LEFT JOIN. This is closer to the correlated subconsultation you used, which internally is transformed into OUTER JOIN. // I could add in the topic description which are the indexes of each table?

  • @Josédiz, ready, I made the change in the question. I thank the "disclosure" of this SQL Server query optimizer, I will look more about it.

1 answer

2


I would use the first one, I know it’s not a rule and it depends a lot on the amount of records in addition to other things, but in many cases using sub-queries makes the query a little less performative.

  • 2

    Look at the execution plans that came along to the question. How can you tell in that case that the performance was affected?

  • 1

    A lot of Compute Scalar are set to 0%, but this is the approximate rounded value, there is no way it 'cost' exactly 0,000%, in the second query there is a Compute Scalar more than in my opinion is unnecessary, there is no reason to leave the way with an extra step, however much in this case the difference will not be visible to our perception, being able to use either of the two without affecting the final result, but I would choose to use the "shorter" path, note 10 for being so careful.

  • 2

    Well, you made an important point. There really is one more step, but like other steps in this process it cannot have a significant representation of cost. Even, it is quite capable of being bigger than the other two. Why? For precision. Perhaps the two steps of 0% are 0.0001%, but the single step is 0.0003%. We’ll never know the truth because the software isn’t accurate enough.

  • 1

    The ideal would be a precision software that shows the milliseconds of the total running time in several decimal places, here in the company where I work has, but it is specific to the technologies that the company uses, would not have how to use outside, but is extremely useful.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.