File import and read

Asked

Viewed 940 times

3

I have an Asp.Net MVC project with Entity Framework 4 and SQL Server, where there is a Action who receives via POST an archive csv.

Taking into account the use of memory, processing time and resources or any question regarding the database, which of these two code options is the most recommended?

An approximate number of 15000 lines can be used in the file.

Option 1:

var postedFile = Request.Files[0];

using (var reader = new StreamReader(postedFile.InputStream))
{
    using (TransactionScope scope = new TransactionScope())
    {
        while (!reader.EndOfStream)
        {
            var line = reader.ReadLine();
            var values = line.Split(';');

            // Validações

            db.Produtos.Add(new Produto { Descricao = values[0], Valor = values[1] });
        }

        db.SaveChanges();
        scope.Complete();
    }
}

Option 2:

var postedFile = Request.Files[0];

using (var reader = new StreamReader(postedFile.InputStream))
{
    var produtos = new List<Produto>();
    while (!reader.EndOfStream)
    {
        var line = reader.ReadLine();
        var values = line.Split(';');

        // Validações

        var produto = new Produto
        {
            Descricao = values[0],
            Valor = values[1]
        };

        produtos.Add(produto);
    }

    using (TransactionScope scope = new TransactionScope())
    {
        produtos.ForEach(p => db.Produto.Add(p));

            db.SaveChanges();
            scope.Complete();
    }
}

There is a Option 3 best?

  • You’re saving a file .csv in your database in an array of bytes?

  • no no, I want to do everything in memory...

  • Good I think I almost got it, are you reading the file and added each line in your database? File Template . csv? would help a lot!

  • 1

    @John is just that, and the model is Descrição;Valor by the data of the question. P

  • @John, my doubt is not how to do it, but how best to do it. The above is just an example, in this case we would have two columns, a description and a value separated by a semicolon. The idea is to read the file without saving it on the disk, but for this some questions arise: which solution requires less memory usage? When creating a Transactionscope with 15000 items I can impair the performance of the application or bank?

  • So@Jedaiasrodrigues you didn’t post this in your question, this 15,000 line scenario up for pure Sqlclient is complicated. i understood the reading in memory, but, missing prerequisites that you just talk for example 15,000 thousand lines is a worrisome factor for Entity Framework. Seeing so The first is the best with a fit of try catch roolback

  • @Jedaiasrodrigues what version of ASP MVC?

  • 1

    @João understood, I’ll be editing the question. I’m using MVC 4, the project is old.

  • 2

    @Jedaiasrodrigues, I advise you not to use the List<T>.ForEach, it makes the debuggar code bad, is slower, and is assigning a function to the list that is not hers. When reading the CSV, be careful with escape characters and other rules of the same, then I advise you to use the Csvhelper. Finally, for performance reasons, avoid saving the 15,000 records at once, make a Savechanges every 100~1000 records, or better make use of the Sqlbulkcopy

  • As I told you, and @Tobymosque said above, you can use Sqlbulk. I needed to save 100,000 records in my comic. Using the foreach would take about 5 minutes. Instead of saving directly, I used foreach to save the data to a list, and used Bulkinsert to enter the data. The task changed from 5 minutes to about 15 seconds. If you need it, let me know and I’ll get back to you later.

Show 5 more comments

2 answers

3


There is a Better Option 3?

There is a. Start by installing Filehelpers and the Bulkinsert for the Entity Framework.

Use something similar to option 2:

using FileHelpers;
using EntityFramework.BulkInsert.Extensions;

var postedFile = Request.Files[0];

using (var reader = new StreamReader(postedFile.InputStream))
{
    var engine = new FileHelperEngine<Produto>();
    var produtos = engine.ReadStream(reader);

    using (TransactionScope scope = new TransactionScope())
    {
        db.BulkInsert(produtos);
        db.SaveChanges();
        scope.Complete();
    }
}

Don’t forget to decorate Produto with:

[DelimitedRecord(";")]
public class Produto { ... }

About Option 1

Adding one product at a time to the context will slow this context down with a few thousand records. Operation may be excessively slow and take much longer than desired.

About Option 2

It’s not much different than option 1, with an aggravating feature: you’re using more memory to maintain the list and then to add it to the context (it’s not much more memory, but it’s).

During the transaction, there is a risk that the table will be blocked for possible queries by other users?

It does, considering that the standard isolation level of the transactional scope is as high as possible (Serializable).

It is possible to change the level of isolation of records (see option TransactionOptions).

Bulkinsert inserts items into blocks?

Yes, as a single block. There is no separation into smaller blocks.

I can handle this?

Yes, by assembling the blocks in memory before.

To Transaction is open during the insertion of the 15000 items?

Stays open until the using ends or scope.Complete() be called.

This must not interfere with performance?

Transactions are slower than normal insertion operations because the database performs a series of operations to validate this scope. The performance gain comes through the Bulk Insert.

Another issue is that my project is old, and is using version 4 of EF, but Bulkinsert has EF 4.1 in its dependencies, and I’m afraid that installing it will update the EF and cause some headache.

I don’t know any problems with updating the EF version, unless you make use of OriginalValues of some entity somewhere. Other than this, I advise yes to update.

  • Thank you very much for the reply! Could you explain a little more about why you use Filehelpers and Bulkinsert, and what the problem is with the options presented?

  • During the transaction, is there a risk that the table will be blocked for possible queries by other users? Does Bulkinsert insert items into blocks? Can I control this? Does Transaction stay open during the insertion of the 15000 items? This must not interfere with performance?

  • Another issue is that my project is old, and is using version 4 of EF, but Bulkinsert has EF 4.1 in its dependencies, and I’m afraid that installing it will update the EF and cause some headache.

  • ;(

0

Another way would also be to use handmade Bulkinsert, a generic method that reads the properties and mounts a super fast Insert...

It has a github repository that contains two very useful methods for Bulk operations: Bulkinsert and Bulkupdate using Mysql and EF6+.

Basically, both functions found in the link read the generic properties passed in the methods and assemble the relative querys (update or Insert)...

Obs1: The project was developed aiming at the particular needs, if for example the bank is different, it will be necessary adaptations, but it is already a beginning.

Obs²: The project is open to those who wish to improve it in order to generate value for the community.

Please take a look at here

Browser other questions tagged

You are not signed in. Login or sign up in order to post.