Bulk C# and Sql Server NFE xml import

Asked

Viewed 263 times

1

Good morning! I have an application with the following features:

  1. Windows Forms,
  2. C#
  3. Visual Studio Community 2019
  4. Sql Server Express
  5. Windows 10 Pro

About: I created the application in order to import to Sql Server electronic tax notes through xml’s, the import is massive, in the order of hundreds of thousands of xml files, generating millions of records. As a reference, the idea is to import 3 million xml files in less than 8 hours.

Problem The start of the import has a very good speed and as it advances, after some time, usually 20% of the import the performance suffers abrupt drops in the import and so goes exponentially until practically stop importing the files.

What has already been done The idea is to read the directory where the files are inside a repeat loop, thus getting one file at a time, deserializando based on the farm xsd. This process is apparently fast and does not seem to be the root of the problem, initially I used the Entity Framework Core with Repository Pattern for the Inserts, to each imported file I instantiated the classes that represent my tables in sql server passing the values deserialized via contrutor, code below is an example:

var nFe_obsCont = new NFe_obsCont(nfeId, obsCont.xTexto, obsCont.xCampo);
                        new EmpresaRepository<NFe_obsCont>().Adicionar(nFe_obsCont);

This was done with all the other classes, remembering that in a single nfe file, there can be up to 990 products, the read Foreach of a file has other 2 loops in, one for products and the other for taxes, I say this because it is the kind of thing that affects performance, because it has many validations, if fields do not have values.

I warn that this model described above has been abandoned, because using EF Core the performance with great quantity was very bad, although EF Core has improved and much, compared to the previous version.

What I did to make it better To improve the performance I refactored the import and started using Sqlbulkcopy, so at this time I started loading my classes (Sql Server Tables) in a Datatable and every 10 thousand records and call my method, which dumps the data in sql server, following the code below:

public class SqlServerBulkCopy
{
    public event LinhasAfetadasEventHandler LinhasAfetadasEvent;
    private string tableName;
    public void BulkData(DataTable dt)
    {
        if (dt.Rows.Count <= 0)
            return;

        tableName = string.Empty;
        tableName = dt.TableName;

        using (var bulkCopy = new SqlBulkCopy(_minhaStringConexao))
        {
            bulkCopy.DestinationTableName = $"dbo.{dt.TableName}";
            bulkCopy.ColumnMappings.Clear();

            foreach (DataColumn column in dt.Columns)
                bulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                    
            try
            {
                bulkCopy.NotifyAfter = dt.Rows.Count;
                bulkCopy.SqlRowsCopied += OnSqlRowsCopied;
                bulkCopy.BatchSize = 5000;
                bulkCopy.BulkCopyTimeout = 60;
                bulkCopy.WriteToServer(dt);
                bulkCopy.Close();
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.ToString());
            }
        }
    }

    private void OnSqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
    {
        LinhasAfetadasEvent?.Invoke(sender, e, tableName);
    }
}

Apparently I had been successful and when making a relatively large import, the import behaved well and maintained a regularity throughout the imports of the archives, but when trying to import just over 2 million files, it imported 10% in almost 11 hours, that is, I’m back to square one, the behavior remains the same, in large quantity, reaches a point where it seems to go exponentially losing the import capacity, although in all my code review if I was releasing some resources and so make sure, I realize that the issue is not in the application but in the bank, rebound that the tables do not have relationships among themselves, foreseeing this I created clustered index and as whole type, because in the past my pk was Guid type, which was very bad for this model, since even being clustered sql server needs to reorganize when making the Inserts.

What am I doing

So far I’m testing to include the entire Datatable at the end of the reading, I’m still testing it, but it worries me that, because I won’t always know the size of the inclusions and this can have a bad impact if the base is giant. After a lot of searching, I found people who record txt files and then import via bulkCopy, but I haven’t tried this approach and honestly for me wouldn’t make any sense.

Good if anyone has any tips about importing millions of records and can share!

Thank you

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.