What can cause the RU’s performance to fall in this scenario?

Asked

Viewed 475 times

6

I’ve been making a data import for a project on ASP.NET MVC 5 with EF 6.1 and SQL Server 2008 where the import data source was a file txt.

The record lines of the file were not very large but contained a certain amount of information to process. As for example, in the record line there was the data of an employee, a position in the company, neighborhood, city etc. Everything becoming a table in my model.

So, line by line I checked the existence of each neighborhood, city, position and all other data to obtain the identification of those who were already registered or then record the non-existent yet.

Later I opted to cache. I simply uploaded the data in generic lists to the memory and before going into the database I checked that information in memory.

The other information did not generate a large number of data, but the register of employees did. There were more than 27 thousand records.

Anyway, my question is about a performance problem I had while making the import using the EPH. After the first thousand records (a little more) the import performance drops a lot, registering one employee every 1 or 2 seconds.

Until I realized that by canceling the import process and starting again the first thousand records were imported in a good time, but after the thousand records the thing went back slowly.

I resolved destroying my context every thousand records and creating again. Then the import had a good performance from start to finish.

I’d like to know that you’ve been through something similar, why does this occur? It may be some Entity Framework configuration? Mine is out of settings.

Follow the import code snippet for anyone who wants to analyze:

private List<Domain.int> Matriculas;
private List<Domain.CentroCusto> CacheCentrosCusto;
private List<Domain.Local> CacheLocais;
private List<Domain.Profissao> CacheProfissoes;
private List<Domain.Cidade> CacheCidades;
private List<Domain.Bairro> CacheBairros;
//Funcionarios
public ActionResult Funcionarios(string filePath)
{
    var fileInfo = new FileInfo(filePath);
    if (!file.Exists)
        throw new Exception("Arquivo não encontrado");

    Matriculas = contexto.Funcionarios
        .Select(x => x.Matricula)
        .ToList();

    CacheCentrosCusto = contexto.CentrosCusto.ToList();
    CacheLocais = contexto.Locais.ToList();
    CacheProfissoes = contexto.Profissoes.ToList();
    CacheCidades = contexto.Cidades.ToList();
    CacheBairros = contexto.Bairros.ToList();

    var file = new System.IO.StreamReader(filePath);
    try
    {
        var count = 0;
        string line = "";
        while ((line = file.ReadLine()) != null)
        {
            string[] campos = line.Split(';');

            int matricula;
            if (int.TryParse(campos[0].Trim(), out matricula))
            {
                if (Matriculas.Contains(matricula))
                    continue;
            }

            var funcionario = new Domain.Funcionario();

            funcionario.Matricula = matricula;
            // obtendo outros dados ...

            funcionario.CentroCustoId = GetCentroCustoId(campos[34].Trim());
            funcionario.LocalId = GetLocalId(campos[35].Trim());
            funcionario.ProfissaoId = GetProfissaoId(campos[36].Trim());

            //Cidade e Bairro
            funcionario.BairroId = null;

            var bai_desc = campos[11].Trim();
            if (!string.IsNullOrEmpty(bai_desc))
            {
                var cid_uf = "";
                var cid_desc = campos[12].Trim();

                // trabalho a string ...

                var cidade = new Domain.Cidade();
                cidade.Nome = cid_desc;
                cidade.Uf = cid_uf;

                var bairro = new Domain.Bairro();
                bairro.Nome = bai_desc;

                funcionario.BairroId = GetBairroId(bairro, cidade);
            }

            try
            {
                contexto.Funcionarios.Add(funcionario);
                contexto.SaveChanges();

                Matriculas.Add(matricula);

                count++;
                if (count == 1000)
                {
                    count = 0;
                    contexto.Dispose();
                    contexto = new DataAccess.Context();
                }
            }
            catch (DbEntityValidationException e) { ... }
            catch (Exception e) { ... }
        }
    }
    finally
    {
        file.Close();
    }

    return RedirectToAction("Index");
}

My context is created in the Controller class constructor and destroyed in the method Dispose.
Notice that there is a counter and every thousand records I destroy and create again my context.

An example for the other methods present:
OBS: All other methods contained in the import code have the same structure as this.

private int? GetProfissaoId(string descricao)
{
    int? profissao_id = null;
    if (!string.IsNullOrEmpty(descricao))
    {
        var profissaoTemp = CacheProfissoes
            .SingleOrDefault(x => x.Descricao.ToLower().Trim() == descricao.ToLower().Trim());

        if (profissaoTemp == null)
        {
            try
            {
                contexto.Profissoes.Add(profissao);
                contexto.SaveChanges();

                CacheProfissoes.Add(profissao);

                profissao_id = profissao.Id;
            }
            catch (DbEntityValidationException e) { ... }
            catch (Exception e) { ... }

            profissao_id = profissao.Id;
        }
        else
        {
            profissao_id = profissaoTemp.Id;
        }

    }
    return profissao_id;
}
  • Could you put an excerpt of this code? how does Savechanges do?

  • I have some things that I would change in this little code! the hard one is to test

1 answer

5


This performance problem is well known. By adding the objects one by one, you force the Entity Framework to check the states of all the objects attached to the context, which is bad for performance.

There are a few ways to solve it. I’ll introduce them and you choose which one is best for your case:

1. Switching off the AutoDetectChangesEnabled

Just do the following:

context.Configuration.AutoDetectChangesEnabled = false;

This should end the verification of the data between all entities attached to the context.

2. Decrease batch to 100 records

Remember that you are entering not only employees, but cities and neighborhoods, which makes the work unit taking up to 3000 records. Make the number of records per batch shorter and SaveChanges() before doing the Dispose(), just to ensure that nothing is lost:

if (count == 100)
{
    count = 0;
    contexto.SaveChanges();
    contexto.Dispose();
    contexto = new DataAccess.Context();
}

3. Use the Caches separately; AddRange() every batch

Create separate lists for your caches, enter them normally, and at the end of each batch Insert all elements at once:

context.Funcionarios.AddRange(CacheFuncionarios);
context.Cidades.AddRange(CacheCidades);
...
context.SaveChanges();

You can even use all the alternatives presented at the same time, but I recommend implementing one by one and checking the performance. Each case can make the code behave very different from each other.

See more in http://weblog.west-wind.com/posts/2013/Dec/22/Entity-Framework-and-slow-bulk-INSERTs


UPDATE

An update of this answer is required because new alternatives have appeared to solve the problem of Bulk Insert using Entity Framework.

One of them is the Entityframework.Bulkinsert (which has a separate package for version 5 and another for version 4). There are other versions for other databases, such as Microsoft SQL Server Compact Edition.

The first part answer is still useful for other types of performance improvement, but no longer so useful in case of Bulk Insert.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.