Read multiple lines from a file in parallel with c#

Asked

Viewed 2,296 times

0

I have a file with almost 700mb that has numerous lines with Json within it. I need to treat every json line-by-line and insert them into my database.

The point is, today I’m using the following code:

 using (StreamReader arquivo = new StreamReader(System.IO.File.OpenRead(file), Encoding.UTF8))
   {
       while (arquivo.Peek() > -1)
        {
            //tratamento do arquivo.
        }
   }

How can I read the lines in parallel for the process to get faster?

  • 'Cause you don’t carry it all at once and treat it right in memory? Link1 - To read one line at a time, Link2

  • The file is 700 MB, unfortunately there are many files. I will not have resource available for such.

  • Take a look at this thread from Soen-1. Look at this case too Soen-2

2 answers

1

As you have a text file, where the lines may have different sizes, you will not have an efficient way to read the file in parallel.

What you can do, however, is read the lines sequentially, and perform their processing in parallel. For example, you could use the System.Threading thread pool, or use your own pool to do so, where you would put the lines to be processed in a queue, and as there is a free thread, it would take the next line to be processed:

public void ProcessaArquivo(string file)
{
    using (StreamReader arquivo = File.OpenText(file))
    {
        string linha;
        while ((linha = arquivo.ReadLine()) != null)
        {
            ThreadPool.QueueUserWorkItem(ProcessaLinha, linha);
        }
    }
}

private void ProcessaLinha(object parametro) {
    string json = (string)parametro;
    // realiza o processamento
}

0

Basically you create a queue for the lines that were read in the file, and then you can create multiple threads to process them, the reading of the lines is fast, the process in the BD slower, then...

Follow example of code:

        Queue<string> linhas;
        private void LerLinhas()
        {
            linhas = new Queue<string>();
            string linha = null;
            StreamReader reader = new StreamReader("Arquivo", Encoding.Default);
            while ((linha = reader.ReadLine()) != null)
            {
                linhas.Enqueue(linha);
            }
            reader.Close();


        }

        private void Processa()
        {
            if (linhas != null)
                while (linhas.Count > 0)
                {
                    string linha = linhas.Dequeue();

                    //Processa Linha, BD, Etc... 
                }
        }

        private void IniciaProcesso()
        {
            Thread tLerLinhas = new Thread(LerLinhas);
            tLerLinhas.Start();

            Thread.Sleep(1000);
            int nThreads = 5;
            for (int i =0; i<nThreads;i++ )
            {
                Thread t = new Thread(Processa);
                t.Start();
            }

        }

Just call the Initiatprocess() method, it will start 5 threads of processing lines. You can change the amount of threads, remembering that several can even worsen processing by over-changing context.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.