Is a subquery in SELECT calculated for each result or just once?

Asked

Viewed 8,471 times

17

Based on this example, where the result will be used to calculate the percentage of occurrence of each 'guy', which approach is more efficient/faster?

Use a subquery in the SELECT to calculate the total number of records:

SELECT tipo, COUNT(*) AS Parcial, (SELECT COUNT(*) FROM tabela) AS Total FROM tabela
GROUP BY tipo;  

Or use two querys, one to calculate total records:

SELECT COUNT(*) AS Total FROM tabela;  

and another to calculate the total 'guy'

SELECT tipo, COUNT(*) AS Parcial FROM tabela
GROUP BY tipo;

There are other, more efficient ways to do this?

  • 1

    It depends on how Voce uses its subquery. In your example as already exposed in repsosta it only runs once. But if your subquery depends on the value of the record, it runs N times records. Ex: SELECT Codigo, (SELECT T2.Descricao FROM TABELA2 T2 WHERE T2.Codigo = T1.Codigo_T2) FROM TABELA1 T1

  • Hello ramaral, I created the test table, only one thing is why are using 2-point-comma? this to finish, see the third option is wrong, the right is to be without the point-of-comma at the end and then the GROUP. Tries select tipo, count(*) as parcial from tabela group by tipo;

  • @Kingrider You’re right about the point and comma, it’s the most, it was error while typing.

3 answers

9


I believe that this question needs to be better explored. There is no general answer, as it depends on the implementation of each database engine. SQL is declarative. You say what whether and not as get. The as is on the engine. In some cases it is possible to give a hint (hint) to the engine, but not radically change the way it works.

So, what I’m going to show you here are tests I did on SQL Server 2005.

My tests were based on two queries. The first of them is in the question. The second (cross-Join), is in that reply. See below:

Query 1

SELECT
    NUMBER,
    (SELECT COUNT(*) FROM NUMBERS)
FROM
    NUMBERS

Query 2

SELECT
    NUMBER,
    TOTAL.T
FROM
    (SELECT COUNT(*) T FROM NUMBERS) TOTAL,
    NUMBERS

Table Numbers

The creation and completion of the Numbers table (999999 records) can be seen below.

CREATE TABLE [dbo].[Numbers](
    [Number] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
    [Number] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

--1 milhão de registros são adicionados
insert into Numbers(Number)
select top 1000000 row_number() over(order by t1.number) as N
from   master..spt_values t1
       cross join master..spt_values t2

Hypothesis

The hypothesis is that no SELECT COUNT(*) is made for each record in the case of Query 1.

It is a simple optimization that SQL Server programmers would not pass up. Note that (SELECT COUNT(*) FROM NUMBERS) is completely independent of the query. Its value can be calculated once, stored and only placed in the SQL return (as if it were a constant).

Analyzing

The image below shows the execution plan of Query 1:

Plano de Execução da Query 1

This image shows the execution plan of the Query 2:

Plano de Execução da Query 2

The only difference is an operator called Computer Scalar. The rest of the operators are exactly equal in the position within the tree and in the values computed/estimated by the SQL Server Planner. The Scalar Computer has an estimated 2% cost for this case.

I went a little further and made an analysis using the SQL Server Profile. See the return:

Profile do SQL Server para Query 1 e 2

The most important thing to note here is that the number of Reads (readings) of the two queries is equal. Already the time of Query 1 is a little higher exactly due to CPU consumption (also a little higher). Certainly is Scalar Computer operator shown above.

Still on the profile, I checked which cost to run only the instruction below:

SELECT COUNT(*) FROM NUMBERS

The result can be seen below:

Profile apenas para o count

Completion

Given the above, it is possible to realize that the hypothesis raised is correct and that the SQL Server Planner 2005, in the context presented, nay performed a Count operation for each returned line. Quite possibly other Bank Engineers also optimize queries like this one, in order to avoid unnecessary processing.

Note that, even so, the Query 2 has a slightly better performance and may be indicated by it. However, it should be clear that the objective here was not to compare the performance of the two queries, but to show that there is no Count(*) per entry in the SQL of Query 1.

  • I just do not know if you can say that CPU expenditure is equal to 78*1000000 since it will also include the expenditure on the preparation of query, that in the case of Query 1 is once and here you are considering 78 times.

  • i did a test running at once 20 SELECT COUNT(*) and the sum cost was ~1500 CPU. Dividing that by 20 gives ~75. I believe that, in this case, the plan is reused. Anyway, I agree that it is a statement a little strong and lack foundation. I will withdraw it.

  • @ramaral, just to clarify: when I said cost plus, it wasn’t something I added up one by one. The SQL Server Profile displayed only one event for the 20 queries executed. I took the value and split it by 20.

  • Downvoter, feel free to point out the problem in my reply; thank you since.

8

I found important the considerations of Cantoni, and I believe that the question of performance goes beyond the structure of how to assemble the query, of course it influences and a lot, but it comes to a point where the SGBD itself is the one who solves, so I put an answer based on mysql.

I tested with the sql of Ramal and Rafael Guerreiro (adjusted because the way this posted the error) follows below the explain

Below consultation based on the Ramaral technique.

    mysql> explain  SELECT comp_pago, COUNT(*) AS Parcial, (SELECT COUNT(*) FROM tab_controle_compras_item) AS Total FROM tab_controle_compras_item GROUP
 BY comp_pago;
+----+-------------+---------------------------+-------+---------------+----------------+---------+------+------+---------------------------------+
| id | select_type | table                     | type  | possible_keys | key            | key_len | ref  | rows | Extra                           |
+----+-------------+---------------------------+-------+---------------+----------------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | tab_controle_compras_item | ALL   | NULL          | NULL           | NULL    | NULL | 8780 | Using temporary; Using filesort |
|  2 | SUBQUERY    | tab_controle_compras_item | index | NULL          | fk_comp_id_idx | 4       | NULL | 8780 | Using index                     |
+----+-------------+---------------------------+-------+---------------+----------------+---------+------+------+---------------------------------+

Below consultation based on the technique of Rafael Guerreiro

mysql> explain  SELECT tab.comp_pago, COUNT(1) AS Parcial,tot.total     FROM tab_controle_compras_item tab,             (SELECT COUNT(*) as total FRO
M tab_controle_compras_item) as tot     GROUP BY tab.comp_pago;
+----+-------------+---------------------------+--------+---------------+----------------+---------+------+------+---------------------------------+
| id | select_type | table                     | type   | possible_keys | key            | key_len | ref  | rows | Extra                           |
+----+-------------+---------------------------+--------+---------------+----------------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>                | system | NULL          | NULL           | NULL    | NULL |    1 | Using temporary; Using filesort |
|  1 | PRIMARY     | tab                       | ALL    | NULL          | NULL           | NULL    | NULL | 8780 | NULL                            |
|  2 | DERIVED     | tab_controle_compras_item | index  | NULL          | fk_comp_id_idx | 4       | NULL | 8780 | Using index                     |
+----+-------------+---------------------------+--------+---------------+----------------+---------+------+------+---------------------------------+

In my view the consultations so far presented are just other ways of getting the same performance, I will follow because I am also interested in this subject, I hope that someone can demonstrate something more efficient if possible.

I’ve reached a satisfactory result for only one was made single full scan goes below:

    mysql> explain SELECT COUNT(IF(comp_pago=1,1, NULL)) 'pagas', COUNT(IF(comp_pago=0,1, NULL)) 'nao pagas' FROM tab_controle_compras_item;
+----+-------------+---------------------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table                     | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+---------------------------+------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | tab_controle_compras_item | ALL  | NULL          | NULL | NULL    | NULL | 8780 | NULL  |
+----+-------------+---------------------------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)

Follow below reference source:

http://www.mysqltutorial.org/mysql-count/

  • 1

    +1 Ninja, so your analysis corroborates with what I did for SQL Server 2005, that is, both Engines perform an optimization so they don’t have to execute a select Count(*) for each returned record. This is the expected behavior of complex Engines like these.

7

As done, Count will run for each row of the select. The same would be if you used the subquery in the Where clause.

For you to execute this second Count only once, just make a Cartesian plan:

SELECT tab.tipo, COUNT(1) AS Parcial, tot.total
FROM tabela tab,
    (SELECT COUNT(*) total FROM tabela) tot
GROUP BY tab.tipo;
  • Rafael tried to run his sql and returned error "Unknown column tot.total" (rodei in mysql)

  • I forgot to give an alias to Count(*): (SELECT COUNT(*) total FROM tabela) tot

Browser other questions tagged

You are not signed in. Login or sign up in order to post.