Why is Addrange so much faster than Add?

Asked

Viewed 1,245 times

16

I’m working on a data integration between two databases, and I’m using Entity Framework for that reason.

I then generated the following code, which iterates each record in the Base Situations table dbExterno and feeds my base db :

    foreach (var item in dbExterno.Situacoes)
    {
        StatusRecursos statusNew = new StatusRecursos();
        statusNew.Id = item.CodSit;
        statusNew.Nome = item.DesSit;

        db.tTipStatusRecursos.Add(statusNew); //Isso se mostrou muito lento!
    }
    db.SaveChanges();

However I noticed that the code above was very slow, taking minutes to complete an interaction in about 3000 records.

I changed it then to the code below, and the process took seconds. In this second code I instead of adding each item to the context using Add(), first feed a generic list of Statusrecursos, and then add to the context using AddRange().

    List<StatusRecursos> listStatus = new List<StatusRecursos>();
    foreach (var item in dbExterno.Situacoes)
    {
        StatusRecursos statusNew = new StatusRecursos();
        statusNew.Id = item.CodSit;
        statusNew.Nome = item.DesSit;
        listStatus.Add(statusNew);  //Não foi lento como o código anterior.
    }
    db.tTipStatusRecursos.AddRange(listStatus); 
    db.SaveChanges();

I know it got faster, but I don’t know why add the items first in a list and add to context for AddRange() was so much faster.

What is the explanation for this?

2 answers

15


Whereas you’re using the Entity Framework 6.

What happens is that during the AddRange() automatic change checking is disabled, unlike with Add(). Try disabling the check and redo your test using the Add():

context.Configuration.AutoDetectChangesEnabled = false;

You can find more details in this MSDN article.

-1

I don’t have specific knowledge about the Entity Framework, but from what I can tell, the use of an operation AddRange, instead of multiple operations Add implies a much better performance due to the cost of performing synchronous operations on disk.

Due to the way hard drives work, a write operation takes a much longer time than a write operation in RAM. (A hard disk operates on the millisecond time scale, while RAM operates on the nanosecond time scale, about 1 million times faster)

This time can be considered as a kind of latency, because (roughly) it is independent of the amount of data you are recording, and is related to the time needed to put the data on the bus (bus), send the commands to the disk, position the read head, effectively write the data and then receive a reply that the data were recorded successfully.

The functions Add and AddRange probably wait for the data to be written to the disk only to return, as they are probably synchronous. The time difference between recording 1 record or 3000 records on disk is probably not very significant, so it is worth much more to record all records at once than performing 3 thousand operations.

An interesting analogy is the following: Suppose you have to send 100 people from Rio de Janeiro to São Paulo. It is much more worthwhile to wait for everyone to board the same bus and make the journey together, than to send one at a time, and only send the next when you receive a phone call warning that the previous one has arrived.

Performance improvement is even more dramatic if engaging network communication with a remote database server.

I recommend experimenting with different sizes of List<StatusRecursos> to find the ideal balance of memory and performance expenditure. It is possible that the difference from 1 to 100 records at a time is dramatic, but the difference from 100 to 3000 records at a time is not as significant. In this case, it’s best to save memory, especially if your app is running on mobile platforms. Another alternative is to check if the library has asynchronous recording functions.

  • I get your point. What I meant is, I don’t have specific knowledge about the Entity Framework, but I’ve used other Orms. Also, the question seemed to me general computer science, about why store data in a buffer before burning to disk increases performance.

  • I get it. I really don’t have much experience with Stack Exchange, except for a few questions on Stack Overflow in English. (Most of the time I’m a lurker) Sorry for the confusion.

  • I liked your answer. A pity not to be in the context of the question.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.