Unexpected result in using Parallel.Foreach

Asked

Viewed 265 times

2

Next, I have a class that has a string list and the following structure

public class Teste
{

      private List<string> _codigos;


      public void InsertDB(string[] files)
      {
          _codigos = new List<string>();

          Parallel.ForEach(files, file => Processa(file));

          Console.WriteLine(_codigos.Count);    
      }

      private static void Processa(string file)
      {    
           //Efetua um tratamento
           string resultado = "Obtem um resultado";
           _codigos.Add(resultado);    
      }
}

The problem is this: if my array files has 7000 elements, my code list, should have 7000 elements. But this does not happen, every time I run the program, the list gets 6989, 6957, 6899, etc.... Always a random number.

The interesting thing is, when I replace Parallel.Foreach() by a simple foreach() as follows:

foreach(string file in files) {
    Processa(file);
}

Oh yes I get the expected result, _code with 7000 elements.

What am I doing wrong?

  • 1

    The real code is not this, right? It has a Try-catch?

  • the real code is not this, it is only to exemplify the problem. this code already demonstrates the same result that I get.

  • Another thing: why use the Parallel.ForEach?

  • because the array files usually have 2,000,000 elements. and becomes much faster parallel processing, using all processor cores.

  • 1

    But you can’t just add items to a list in parallel. There is a whole process to add an item to a list, obviously doing this in parallel will cause problems.

2 answers

5


List is not thread-safe, what may be happening is that in some cases two threads will try to add an item at the same time, this can generate something unexpected (such as adding only one object or an exception). In your case I recommend using Concurrentbag:

var _codigos = new ConcurrentBag<string>();

Concurrentbag is best because it uses internal bags that hold the value for each thread and does not make use of lock, and avoid the problem.

2

Gabriel Coletta is right. List is not thread-safe.

Whenever you have code in parallel trying to write to a single structure, you have to verify that the code does not conflict.

An example that shows the care you need to take is (this will be in C++, but it’s simple):

void AdicionarNumero(int valorNovo)
{
    valorCompartilhado += valorNovo;
    return;
}

If you have two "threads" running this function in parallel, you may have problems. The code valorComparthilhado += valor; maybe it becomes the instructions:

  1. Hold value of valorNovo in a record (CPU memory).
  2. Hold value of valorCompartilhado on a record.
  3. Summarize the two records and save the result in a record.
  4. Save the sum in valorCompartilhado.

If the two "threads" reach step 3 at the same time, the two have the same value of valorCompartilhado guarded. Then the two make disappear, one will save the result, and then the other will save the result. This means that one of the results will be thrown away.

If you do not limit this code to one "thread" each time, you cannot control the result. If both "threads" call AdicionarNumero at the same time with valorCompartilhado == 5 and arguments of 3 and 1, can get results from 6, 8, or 9 stored in valorCompartilhado.

The way to limit code so that only one "thread" can enter each time is with a lock (as Gabriel Coletta commented). You can also use a structure and an algorithm that does not fail in parallel, even without lock (as ConcurrentBag).

  • I’m sorry if there are mistakes in my Portuguese.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.