Cirineu, why you report the problem is that, during the execution of the first T-SQL code described, all the available space on the disk is used, even before the execution of the code ends, aborting the process of transferring lines between the tables. And mentions how possible causes the growth of transaction log.
By the name of the columns present in tblVMDcarga, as well as by the table name, it seems to me that it is a load table (ETL) of a fact table of data Warehouse, probably DNIT traffic count.
Additional information is required to give a specific opinion; as not yet provided, general comments.
Recovery models
As per SQL Server documentation, "recovery models are designed to control transaction log maintenance".
The reading of the documentation shows how the definition of the recovery model impacts directly on the log volume generated.
minimal logging
Regarding the transaction log, there is an approach that is called minimal logging. It is described in some SQL Server documents such as:
There are some requirements for success in this approach, which are described in the above documents. Note that the INSERT construction... SELECT is also considered an operation Bulk import.
tempDB
It is necessary to be attentive to the configuration of the tempDB database, so that it does not become a bottleneck. I suggest reading the article How to Configure Tempdb for Microsoft SQL Server
Parallel execution
Another factor that must be analyzed carefully is how is configured, in the instance, the option max Degree of parallelism.
TRAM (very large Tables)
In the case of VLT tables, one of the options used to transfer rows from one table to another is to copy them into blocks. That is, transfer a piece at a time. The content/size definition of each block depends on the context.
To use block transfer efficiently it is necessary that in the source table there is unique identification (key Unique) to identify what has already been transferred. In the case of fact tables, it is usually used surrogate key. For the sake of performance, it is recommended that there be index by unique identification, preferably clustered.
Assuming you want to transfer millions of rows from the stageDADOS table to the DATA table and the primary key of the stageDADOS table is the idEvento column. You can implement block transfer using something similar to
-- código #1
INSERT into DADOS (colunas)
SELECT colunas
from stageDADOS
where idEvento between ....
At each execution of the above code, a block of lines is transferred. You must keep track of which was the last line transferred at each run, so the next run is from the first line not yet transferred.
That is, with the above approach it is possible to transfer block by block and monitor the growth of the log file.
Follow example with implementation of this approach.
-- código #2
CREATE TABLE stageDADOS (
idEvento int identity primary key,
dataEvento date not null,
infoEvento varchar(200) not null
);
CREATE TABLE DADOS (
idEvento int not null,
dataEvento date not null,
infoEvento varchar(200) not null
);
go
It is observed that the DATA table is of the type heap.
To control what has already been transferred, the Controletransf table is created in the database. It will allow you to keep track of what has already been transferred, as well as issue statistics on the transfer process.
-- código #3
CREATE TABLE ControleTransf (
idBloco int identity,
inicioTransf smalldatetime not null,
fimTransf smalldatetime default cast(current_timestamp as smalldatetime),
nLinhas int,
chaveInicial int,
chaveFinal int
);
go
And finally, the procedure that carries out the transfer of a block of lines.
-- código #4
CREATE PROCEDURE TransfBloco @tamBloco int as
begin
declare @ultimaChave int, @chaveInicial int, @chaveFinal int;
declare @inicioTransf smalldatetime, @nLinhas int;
-- procura última chave transferida
SELECT top (1) @ultimaChave= chaveFinal
from ControleTransf
order by idBloco desc;
set @ultimaChave= coalesce(@ultimaChave, 0);
-- define chaves inicial e final do bloco
set @chaveInicial= @ultimaChave + 1;
set @chaveFinal= @ultimaChave + @tamBloco;
--
set @inicioTransf= cast(current_timestamp as smalldatetime);
--
BEGIN TRANSACTION;
BEGIN TRY
-- transfere bloco
INSERT into DADOS with (TABLOCK) (idEvento, dataEvento, infoEvento)
SELECT idEvento, dataEvento, infoEvento
from stageDADOS
where idEvento >= @chaveInicial
and idEvento <= @chaveFinal;
--
set @nLinhas= @@rowcount;
-- registra bloco transferido
IF @nLinhas > 0
begin
INSERT into ControleTransf (chaveInicial, chaveFinal, inicioTransf, nLinhas)
values (@chaveInicial, @chaveFinal, @inicioTransf, @nLinhas);
SELECT 'Transferidas ' + cast(@nLinhas as varchar(15)) + ' linhas.' as Status;
end
else
SELECT 'Nenhuma linha transferida.' as Status;
-- encerra transação explícita
COMMIT;
END TRY
BEGIN CATCH;
SELECT 'erro na transferência' as Status;
ROLLBACK TRANSACTION;
END CATCH;
end;
go
The execution is simple. It is considered that the database recovery model is as Bulk-logged. To transfer 100 million lines, simply call the Transfbloco procedure, passing the number of lines as parameter.
-- código #5 v2
-- transfere um bloco de linhas
EXECUTE dbo.TransfBloco 100000000
-- monitora ocupação de espaço físico
EXECUTE sp_spaceused 'DADOS';
DBCC SQLPERF (logspace);
go
I suggest starting with a smaller block (10 million lines, for example). You will then have information about performance, time, and physical space allocation for this block size, and you can then define the optimal block size for each run.
The above approach can be modified if the primary key column is, for example, of the type uniqueidentifier.
RBAR
RBAR is the abbreviation of the expression Row-By-Agonizing-Row, coined by Jeff Moden for codes that process one line at a time. Whenever possible we should build codes that handle line sets (set-based), which is more efficient processing on SQL Server. However, there are (few) situations where the use of cursor is more efficient than the use of constructions set-based.
In the second code provided, with the use of cursor, one perceives the presence of an RBAR approach. And, in this case, it seems to me extremely inefficient. In addition to transferring row by row from the source table to the destination table, there is still the aggravating fact that the content of the source table is also deleted row by row. That is, it can increase the log file faster (see recovery templates).
Optimization of code with cursor
(1) In terms of log file, I consider it more efficient to manually delete the source table at the end (when the transfer is considered correct), either through DROP TABLE command (if the source table is no longer needed) or TRUNCATE TABLE command.
The fact of deleting row by row in the source table does not mean that the physical space will be immediately released by the sgbd.
But if you think you need to delete line by line, note that you can delete the current line from the cursor without the need for the DELETE statement to perform a new search (which requires more processing time). Just use CURRENT OF:
-- código #6
DELETE [bd_contagem].[dbo].[tblVMDcargaLote4]
where current of crLerLote4;
(2) To the declare a cursor there are options that theoretically can optimize the execution of it. Careful reading of the documentation can indicate possible options to use.
And in the matter of Hardware there would be no improvement, for example, increasing its processing capacity
– R.Santos
I advise you not to use the cursor for this, it is very slow, the ideal would be you do a SELECT INSERT, that way would be much faster
– Jeferson Almeida
You have an interesting problem on your hands. I say that CURSOR is a bad choice for your case, because it is a lot of records. What you can do is disable transaction log generation, do what needs to be done, and turn it back on.
– cantoni
Just one important detail: the fact of using CURSOR will not cause the log to not be generated. The HD only did not work because the CURSOR is orders of magnitude slower than the INSERT SELECT. Therefore, if you let the CURSOR run long enough your hard drive will burst as well.
– cantoni
@Cirineulacerdabraga: What is the recovery model? // What is the version of SQL Server? // What is the primary key of the table [bd_count]. [dbo]. [tblVMDcargaLote4]? // Can the database be placed in single-user while the line drive is performed? // All rows in table tblVMDcargaLote4 are of the same lot/contract?
– José Diz
@Cirineulacerdabraga: Care with the use of float variables, in this case. I refer to the variables @lote_XXX, @sense and@track, which receive 1 character as content. Why not declare as char(1) or, better yet, as stated in the destination table (tblVMDload)?
– José Diz
@Cirineulacerdabraga: To reduce the size of the log file, I suggest you remove the DELETE command tblVMDcargaLote4, leaving only the copy part (INSERT into INTO tblVMDload). // To delete the contents of table tblVMDcargaLote4, it is more efficient (in terms of reducing log) to run TRUNCATE tblVMDcargaLote4).
– José Diz
@Josédiz With TRUNCATE he cannot use the WHERE clause as he is doing DELETE. The best option is to disable LOG generation and do INSERT SELECT or maybe, I have no way to test with this amount of data, use BULK INSERT.
– Reginaldo Rigo