What is a racing condition?

Asked

Viewed 10,868 times

67

What is this such of race condition that can cause problems in applications? How to avoid it?

  • 1

    Race condition This is new to me :)

  • Stopping to think about the past information, can actually happen these competitions, and, maybe the programmer does not discover the origin of the error.

  • 1

    @Diegofarias yes, and that’s serious. Most current programmers who have poor training (no matter how they got it) do not know these things and do not protect themselves.

  • I have seen cases where the same application that worked on almost all machines did not work in a few because of speed. Competing events caused errors.

  • 4 years later is new to me too :)

3 answers

65


It is a situation that can occur in all cases where a given computational resource has concurrent (even not apparent) access. The best summary is that the situation needs to count that something is in a certain state and will do something presupposing this state, but the state changes by another execution between the time you got the state and the time you will perform something in it.

In other words, it is the situation where the time events occur can influence their execution.

It can even occur in the hardware processes, which must be properly cured. Basic software like the operating system, various servers like database and HTTP also find running conditions. The answer here will focus only on the developments of common applications, which is what you probably have to worry about.

How it occurs

If it is something unique in memory or it is guaranteed that only one application can access can only occur one race condition if more than one thread, otherwise there is no competition, then time is governed in a linear manner by the application.

Already resources shared between various applications are always susceptible to race condition. Even in rare cases, they can happen and most programmers don’t understand that. As it works in tests and everything goes well most of the time the problem is not reproducible, the programmer thinks it is computer failure, operating system, or anything other than your application. Even if it gives a problem, soon after it will not give again, "it looks like something supernatural".

The running condition is not a mistake. It is inherent in some problems that you need to develop. There is no way to escape it. The problem is the error caused by the running condition. And this error only occurs because it has not been properly treated. So race condition is not a bug. But the bug caused by an untreated running condition, and is random.

Some racing conditions, or attempts to solve racing conditions may provoke a deadlock which is the interdependence between two operations that are parallel. One prevents the other from being executed which in turn prevents the first from being executed.

Example

One of the most classic examples is access to a file. Think that your application checks whether a file exists to determine whether it can use it:

if (File.Exists("nomeDoArquivo.txt")) {
    SendFile("nomeDoArquivo.txt");
}
  • Computer/operating system performs file existence check.
  • How does he exist if decides that it may send it.
  • But soon after this has occurred some other application or thread of that same application went there and deleted that file.
  • Soon after the method SendFile() will try to send the file and will not succeed, it no longer exists. Had an untreated race condition. Who arrived first won and prevented the other from succeeding.

But it can be worse:

if (contaJoao.OperacaoASerProcessada() == Operacao.Deposito) {
    contaJoao.Deposita(10000);
    contaJoao.Libera(Operacao.Deposito);
}
  • In this case it checks if it has any pending entry in any object that indicates it has a deposit that needs to be made.
  • Has threads running, more than one will try to do this operation and may be almost at the same time.
  • One thread you see that you have the operation, another one does the same, both find that you have a deposit pending.
  • One of them enters the if
  • The other one goes in right away.
  • The first makes the deposit.
  • The second makes the deposit too.
  • It’s two deposits made, but I only had one pending.
  • Then one releases the operation, the second also releases.

If it is lucky the error will be detected there, if it is badly done nor will it occur. Even if it detects, the error occurred. Until someone fixes the withdrawal of the total value of the two deposits may have been done. The roll is done.

Another simple example:

if (x == 10) { //esse x é uma referência, pode haver mais de um "proprietário"
    y = x * 2;
}

Has threads running and x is a globally shared variable without any lock for your access. You can ensure that y It will be worth 20 if its original state was 10?

It may be that at the time of multiplication it has changed to 11 and the account will result in 22, when 20 is expected. Other unforeseen situations may occur.

See this example taken from Wikipedia.

Without real competition:

Exemplo de race condition

Competition with race condition:

Exemplo de race condition

A very common example. You have stock application that updates the database.

  • In a terminal someone reads the product data, see who has stock to make the sale.
  • In another terminal someone does the same.
  • One makes the sale and drops the stock.
  • The other makes the sale and drops the stock too.
  • Only there was no stock for two sales, only for one. It’s negative.

I got tired of seeing system doing this. It is a very common mistake. And via internet it is not so rare to happen.

Even when the stock doesn’t go negative and it can go wrong.

  • Thinks he had 100 in stock.
  • One terminal sold 20, so it upgrades the stock to 80.
  • Another terminal read that it had 100, sold 30, updated the stock to 70.
  • You sold 50 in total on these 2 terminals. But the stock only took 30 and the withdrawal of the other was lost.

A last typical example, among the many possible ones, is to take a time twice in a row and expect them to be the same. There are no guarantees. Date is rarer in a way, or even easier in another. Think of a report that started running one day and ends the next, it is not so rare, it may be that a part of it Filtre data for a date that is not expected, or at least inconsistently.

Note that I used more examples in memory, but one of the most common running conditions is with a database. What I see most happening is the person wearing one SELECT to check if a data already exists and then a INSERT if it doesn’t exist. You’ve noticed how this is a problem, haven’t you? But there are also examples like the balance shown above and similar forms. With database the chance of happening is higher because it is more common to have competition and the higher latency increases the error window.

How to solve?

There are basically two strategies to solve this:

  1. Don’t check anything, have it done

    Incredibly many of these checks are not necessary, it’s an attempt to make the code more robust by checking if it’s in a safe state, but in practice causes much worse problems, because the problem may go unnoticed, and even if someone realizes it is very difficult to reproduce.

    That’s it, just do it, wait for the action to throw an exception or return some error code and treat it appropriately as needed. One of the main reasons they have created exceptions is precisely to allow this sort of thing. If there are no two phases (check-run) there is no risk of running condition, it becomes an atomic operation*, and is even faster in the common situation.

     try {
         SendFile("nomeDoArquivo.txt");
     } catch (FileNotFoundException) {
         //faz alguma coisa indicando que o arquivo não existe
     }
    
  2. Gain exclusive access

    There are cases that are not just a matter of checking, or there is no way to dissociate the verification from the execution. There has to be some form of locking. You need to gain exclusive access to the object for a minimum period where other execution lines can’t do anything.

     contaJoao.Lock();
     if (contaJoao.OperacaoASerProcessada() == Operacao.Deposito) {
         contaJoao.Deposita(10000);
         contaJoao.Libera(Operacao.Deposito);
     }
     contaJoao.UnLock();
    

    Depending on the language would even fit something better if some exception may not finish the execution:

     try {
         contaJoao.Lock();
         if (contaJoao.OperacaoASerProcessada() == Operacao.Deposito) {
             contaJoao.Deposita(10000);
             contaJoao.Libera(Operacao.Deposito);
         }
     } finally {
         contaJoao.UnLock();
     }
    

    Obviously the way each language, library or specific code does the locking can vary. Some will even offer different ways to make this lock.

    The problem there is not just the question of if, normal operation, regardless of condition, has more than one part and they need to perform together.

    Beware of the database case.

    I’ve seen people giving the solution by locking access to the product (in the stock example posted above) when a terminal queries it. Then the user forgets the open screen and does nothing with it. No one can sell the product that is not in an effective sales process. This is a little more complicated to solve.

    You have to lock only when you make the real sale (a fraction of a second). And you have to update the data in memory in the application from the database before doing the update you want.

    So it hangs at the right time only and solves the problem of information not being the same at the beginning of that routine. If the read and write are done during the locking process, it will be an atomic operation and will not cause problem. The previous reading was only used to start the sale process, but it does not guarantee that it will be done. There is no much better way to do this. But that’s another matter.

  3. Save the data that needs consistency

    If throughout the process you need the data to be the same just store it instead of taking a new one each time you need it. So make sure there are no surprises.

There are some other solutions depending on the case, but these are the most commonly used in general terms. One of them that often the person does not remember is to eliminate the competition whenever possible. It was not always really necessary.

if (x == 10) { //esse x é um valor, ele foi copiado pra cá e é independente de sua origem
    y = x * 2;
}

I put in the Github for future reference.

Quasi-invisible cases

Think of a type of numerical data longer than the size of the computer register. It needs to do a simple arithmetic operation. As it is longer than the recorder it needs to do in two phases. Many languages do not guarantee that the two-part account is made atomically. Would it be possible to have a racing condition there and make an account with a part of the value worth one thing and the other part with a value different from the original? It is possible. Rare, but it can happen. If you do not want to have a headache you need to predict this in case of competition.

You have to study the workings of your language, of its libraries and see if you have to worry or not. So immutable data that has no atomic operations, as should be the example cited should never be shared, or if they are, should provide mechanism to ensure the atomicity of the operations that manipulate it.

The fact that it is rare makes people not worry. But again, when it happens you will never understand why it happened. Will blame "others" when it is actually their application that has not taken the necessary care.

You may be wondering if there is an automated way to find the problem and solve it. There isn’t. You need to learn what it is, how to solve it and apply it to your code properly. There are tools to detect data racing that is a specific type of racing condition, but they cannot detect semantic problems.

When not to worry

When there is no competition in access to the resource need not have the least concern. If that object in the application memory is only used by a thread each time there is no way to have race condition. Threads itself are not the problem, the sharing of objects is the problem. Note that it may even go beyond a thread to another, provided that there is exclusivity when it is in each one.

Whether to do an operation on the operating system file system already complicates. Files are shared by default. You need to ask for exclusivity on the file (lock) and prevent other applications from accessing it simultaneously.

You may think, "I know you will not have other applications accessing". Are you sure? Have control over everything in the operating system? Can’t this change in the future? By someone you don’t even know and won’t know you don’t know you haven’t done the application correctly? Don’t leave room for luck, do it right! Control access to the file. Ask for exclusivity. But do not overdo not to fall into the database problem, lock it only the really necessary time.

This holds true for almost every external resource in your application. If you don’t ask for exclusivity, it has the potential to compete.


*Indivisible, one thing, do it all or do it all.

  • The running condition happens when an object/variable that is shared between two or more threads has its state modified by one of these threads and this modification can break one if that this on another thread, that would be race condition?

  • 3

    Doesn’t have to be a if, this was a common example. There is an error related to the running condition if the change affects the functioning of the other thread unexpectedly. The example of the database, there is a change of stock in a concurrent way, would affect the other application, this is an inexorable running condition, both compete and whoever arrives first (or last depending on the case) wins. But if you treat correctly there will be no error in the application that it could generate. The questions here are important for me to see what is not yet clear and what I can improve, what I will do.

  • @Maniero if I understood correctly, it would be something like a bar that has 10 beers... customer 1 and customer 2 come in asking if they have beer - there and both come in, but the first one buys the 10 and the second one goes in and there’s nothing left for him... no one asked how many they wanted each one, ai the 2nd client goes to 'error treatment' (back later)... Like that? Do and leave

  • 1

    @The worst thing is that many times there is no error treatment, the person gets lost, and then anything can happen, she never comes out of there, sues you, shoots you in the head, or in yours, badmouths you, starts stopping other people from coming in, call everyone to break up your bar, and it doesn’t even have to be her to do anything, your bar can spontaneously combust, more beer can arrive and you keep saying that it’s not trauma of his case.

  • @Maniero I was curious to know how is made the method "contaJoao.Lock()" or there is no way to do this?

  • @Danover there is just an example to indicate that you need to ensure that it is atomic, there are numerous ways to do this.

Show 1 more comment

14

When multiple processes are sharing the same resources, race condition may occur. When we are using threads and two of them access at the same time a shared variable, occurs the race condition.

Sharing issues can be avoided by finding a way to ensure that resources are accessed by just one process. When one of them is and a Critical Region, no other process can enter this region. That is, to implement a Mutual exclusion implementation. However, this alone is not enough. According to this website, which is an adaptation of Andrew Tannenbaum’s book "Modern Operating Systems":

So that we have a good solution to the problem, 4 conditions will have to be attended to:

  • Two or more processes may not be simultaneously within their corresponding critical regions.

  • No Consideration can be made regarding the relative speed of the processes, or regarding the number of processors available in the system,

  • No process that is running outside your critical region can block the execution of another process.

  • No process can be required to wait indefinitely to enter your critical region.

There are several solutions that try to solve the problem of exclusion mutual, solutions such as Interruption Inhibition, Variables of Locking, Strict Alternation, Peterson Solution, TSL Instruction.

But, these solutions work with Busy wait, that is, whenever a time interruption occurs and the scheduler chooses another process to run, the processes that want to enter the critical region will be chosen and will take processing time, without progression in their execution.

One solution that came to eliminate all the problems that still existed was the Traffic light.

Example of inter-process communication: O Spool printing

Example taken from Inter-Process Communication.

When a process wants to print a file, it puts the name of the file in a special directory called directory of spool. A another process called Printer periodically checks if it exists file in this directory, if it has, it prints and removes this directory file.

Let’s imagine that the directory of spool has an unlimited number of entries, each entry can store a file name that will be printed, and have two shared variables: IN, pointing for the next free entry and OUT, which points to the next file which will be printed. Let’s assume that our directory of spool is with the positions of 1 to 3 and of 7 henceforth free, 4 to 6 occupied with filenames that will be printed. The values of the variables IN and OUT are respectively 7 and 4.

Let’s assume that two processes A and B are willing simultaneously print a file. The process A read the variable IN and before he can place the file in the spool, there occurs a time interruption and the processor starts to run the process B. The process B follows the same steps of the process A, reads the variable IN, that still has the value 7, writes the file name at this position and updates the value of IN for 8. When the next interruption of time and the process A return your processing, it will continue from the point from which it was interrupted, and when it was interrupted, the value stored in IN era 7, then the process A will put the name of the file at this position, deleting the file name that B wrote and updates the value of IN again to 8. Thus the process B will never have your file printed. You can also note that the directory of spool remains consistent, the printing process does not will notice nothing wrong in the directory structure.

References:

3

When the defect occurs on the server, but it does not occur on the developer’s machine, or vice versa, and the environment settings are "equal" - of course, this is debatable - in both cases, this may be a default by running condition.

Introduction to the globalization/localization of . NET

A racing condition that can occur on Windows platform is in the globalization/location of ASP.NET.

Globalization/localization, in ASP.NET, is done with XML dictionaries with extension resx, called resources. Each entry in these dictionaries has a key, which corresponds to a value. For example, a feature named Fruits.resx may have entries such as Bananatext, which would be the key, and with Banana; Appletext value, with Apple value; and Pineappletext, with Pineapple value. ASP.NET is in charge of generating code from these dictionaries, which allow writing code as follows:

decimal price = 10m;
Console.WriteLine("{0}: {1}", Resources.Fruits.PineappleText, price); // mostra Pineapple: 10

which makes static reference to the XML dictionary.

In ASP.NET, we say that the culture (own jargon of . NET) is en in the case of location in Brazil, and that the standard culture is Invariantculture. If the culture of thread current is the default, the dictionaries used will be the default ones; in our case above, it would be Fruits.resx. If it were culture en, the dictionary used would be automagically switched to Fruits.pt-br.resx, with entries in the following form: Bananatext, with Banana value; Appletext, with Apple value; and Pineappletext, with Pinecaxi value, and in the case of the program shown above,

decimal price = 10m;
Console.WriteLine("{0}: {1}", Resources.Fruits.PineappleText, price); // mostra Abacaxi: 10

Race condition with globalization/localization . NET

If the thread that takes care of the UI is not the same as exchanging its culture, and if there is a third thread initializing placeholders with resource values (.resx’s), race condition occurs. Let’s see:

Code of placeholders:

public static class Placeholder
{
    public static string APPLE = Resources.Fruits.AppleText;
    public static string PINEAPPLE = Resources.Fruits.PineappleText;
    public static string BANANA = Resources.Fruits.BananaText;
}

Threads:

    UI  o--preenchendo a tela-com webcontrols--------x-------^--- "Apple" X
inicio                                               | pega  |
                                                     | APPLE |
ASP.NET o-InvariantCulture------^-x---x----pt-br-----)-------)---
inicio                          | |   | troca        |       |
                                | |   | cultura      |       |
PlaHold             o-x---------)-)---------^--------v-------x---
inicio                | pega    | | pega    | "Apple"
                      | recurso | | InvCul  |
Resx    o-------------v---------x-v---------x---------------------
inicio

In a developer Workstation, usually the build configuration is DEBUG, which reduces the optimization of the code, in addition to the compiler leaving metadata to spare on Assembly, which makes the execution of the code slower, in general, making it difficult to reproduce the defect.

One way to mitigate this problem is to change the placeholders as follows:

public static class Placeholder
{
    public static string APPLE => Resources.Fruits.AppleText;
    public static string PINEAPPLE => Resources.Fruits.PineappleText;
    public static string BANANA => Resources.Fruits.BananaText;
}

so that the result of the evaluation of Placeholder.APPLE irrespective of the execution of a thread separately, as the thread PlaHold, shown above.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.