CURSOR vs table variable type

Asked

Viewed 2,828 times

2

I’m facing one of those situations where I need to perform an action for each line resulting from a query.

This way I have two options, use a Cursor or a Table Variable, but the two seem very similar to me (semantically).

I wonder if using a Table Variable I’ll get some performance improvement over Cursor.

I believe the difference between the two is that the Table Variable will perform only one query and go through the records in memory, while the Cursor will hold a consultation (Fetch) for each line, but I have no way to confirm (this my achism).

So which of the two is better and why?

EDIT

I decided to add a complete example and statistics.

TABLE

CREATE TABLE [dbo].[CursorTeste](
    [CursorTesteID] [int] IDENTITY(1,1) NOT NULL,
    [Coluna1] [uniqueidentifier] NOT NULL,
    [Coluna2] [uniqueidentifier] NOT NULL,
    [Coluna3] [uniqueidentifier] NOT NULL,
 CONSTRAINT [PK_CursorTeste] PRIMARY KEY CLUSTERED 
(
    [CursorTesteID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

INSERT

DECLARE @count int;
SET @count = 0;

WHILE (@count < 10000)
BEGIN
    INSERT INTO CursorTeste VALUES (NEWID(), NEWID(), NEWID());
    SET @count = @count + 1;
END

CURSOR

DECLARE @coluna1 uniqueidentifier;
DECLARE @coluna2 uniqueidentifier;
DECLARE @coluna3 uniqueidentifier;
DECLARE @CURSOR_teste CURSOR;

SET @CURSOR_teste = CURSOR LOCAL FAST_FORWARD FOR
SELECT Coluna1, Coluna2, Coluna3 FROM CursorTeste

OPEN @CURSOR_teste
WHILE (1 = 1)
BEGIN
    FETCH NEXT FROM @CURSOR_teste INTO @coluna1, @coluna2, @coluna3;
    IF (@@FETCH_STATUS <> 0)
    BEGIN
        BREAK;
    END

    PRINT '{ ' + cast(@coluna1 as varchar(50)) + ' } - { ' + cast(@coluna2 as varchar(50)) + ' } - { ' + cast(@coluna3 as varchar(50)) + ' }';
END

CLOSE @CURSOR_teste   
DEALLOCATE @CURSOR_teste

Table Variable

DECLARE @coluna1 uniqueidentifier;
DECLARE @coluna2 uniqueidentifier;
DECLARE @coluna3 uniqueidentifier;
DECLARE @indice int
DECLARE @count int
DECLARE @tabela table(
    RowNumber int identity,
    Coluna1 uniqueidentifier not null,
    Coluna2 uniqueidentifier not null,
    Coluna3 uniqueidentifier not null,
    PRIMARY KEY (RowNumber)
);

INSERT INTO @tabela
SELECT Coluna1, Coluna2, Coluna3 FROM CursorTeste

SET @count = (SELECT COUNT(RowNumber) FROM @tabela)
SET @indice = 1;
WHILE (@indice <= @count)
BEGIN
    SELECT 
        @indice = RowNumber + 1,
        @coluna1 = Coluna1, 
        @coluna2 = Coluna2, 
        @coluna3 = Coluna3
    FROM @tabela 
    WHERE RowNumber = @indice

    PRINT '{ ' + cast(@coluna1 as varchar(50)) + ' } - { ' + cast(@coluna2 as varchar(50)) + ' } - { ' + cast(@coluna3 as varchar(50)) + ' }';
END

ESTATÍSTICAS DO CLIENTE

  • Rating 1: CURSOR
  • Rating 2: CURSOR FAST_FORWARD
  • Rating 3: CURSOR LOCAL FAST_FORWARD
  • Rating 4: WHILE LOOP WITH @TABLE

2 answers

2

As many people have said here many times: you should only try to solve known performance problems. And potential performance issues (those not yet known but suspected to appear) may well be simulated in advance, so you’ll have a known problem to solve.

Anyway, in your example using type variable table you execute a query to fill the variable and then one more query for each line loaded for this variable. While using cursor you perform a single query and then traverse the result of this query row by row.

Considering their examples, due to the amount of operations, using cursor will have at least the same performance, tending to be less expensive and with less response time than using variable of type table.

But note that there are other options to the cursor besides the one you proposed. For example: if the original query returns a column of unique value, such as an id, or if it returns a set of columns that together can identify each record in a unique way, you can repeat the original query in the loop by switching this unique identifier instead of previously filling a variable table.

Moreover, only use loops if there is no solution other than processing line by line the result of the query. Otherwise, SQL Server specializes in linking tables, reading and processing large sets of records at once. The server will always do this with better performance than using multiple commands and than using cursors. And when a single command ends up demanding an impossible logic, you can still use temporary tables to split the processing logic into parts and each part will act on a large mass of records at once.

If it is necessary to update millions of lines at once, then it may be necessary to process in batches (like 200,000 lines at a time) because the log file (which is used to keep transactions in progress) can burst the disk space. But all this can be verified in previous tests.

  • I understand the whole problem of cursors and that it should be avoided and rarely this is not possible. But the idea here is to compare these two approaches and see if it is possible to get some gain by using a table instead of a CURSOR.

2


Cursors vs Table in memory

Cursors are almost always sub-optimal in relation to performance, but there are exceptions and depending on the case may not be significant to your problem.

In all the tests I participated in that involved cursors, I never saw a situation where they were better at performing, but there are reports on the internet that say the opposite for very specific situations. Examples here and here.

I noticed that you avoided some of the problems by declaring it with FAST_FORWARD, as it allows SQL Serve to plot a more optimized plan for reading the data.

The main problem of the memory table is the moderate use of memory, which ends up limiting the number of records that can be processed by execution.

About your performance test

Through your detailed performance test, you can note that there will be no real performance gain between the two solutions for this particular case. It’s like saying the obvious, with so many details.

Then use the way that is best for you to maintain the code. In this aspect the cursor wins a little, because the code is more compact and intuitive.

The difference in performance between the two solutions can be insignificant if the operation you do on each line consumes many resources. The two solutions may be inadequate in such a context.

When you can use a single query

Whenever you can give preference to darlings which operate on the data set and not on each line individually.

For example, imagine that we have the tables A, B and C and we want to enter a record in C for each relationship between A and B. We could then create a query as follows:

INSERT INTO C (C1, C2)
SELECT A1, B2
FROM A 
JOIN B ON B.FK = A.PK
  • I understand the whole problem of cursors and that it should be avoided and rarely this is not possible. But the idea here is to compare these two approaches and see if it is possible to get some gain by using a table instead of a CURSOR. including added some information to the question, apparently the loop over a table is very similar to the LOCAL CURSOR FAST_FORWARD, when analyzing the server trips, the bytes received and the runtime.

  • @Tobymosque I understand. If you can do a favor, because I am without an environment with SQL Server here, run the test again without the command PRINT and post the result again. Hug.

  • updated the image.

  • @Tobymosque Very good. It seems that Microsoft has greatly optimized the cursors FAST_FORWARD. It was a little bit better than accessing the records in memory. One detail is that you would not need to do the Count of the data if the @@ROWCOUNT as exemplified here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.