TL;DR
SQL databases in general, not only Mysql, are transactional, that is, they allow you to perform a sequence of operations as an indivisible block in order to ensure data integrity in a concurrent access environment.
The Problem
In the example quoted in the question, imagine that the #1, #2 and #3 queries are operations that affect the database and we don’t use a transaction to control them. Let’s use as an example an e-commerce:
- Updates customer delivery data
- Enter a new record of the purchase made, checking if you have stock
- Debit the stock of products
Now imagine two customers trying to finish their purchases in this fictional e-commerce. The server receives two requests almost simultaneously and starts processing the requests in the above sequence. The two orders are being processed simultaneously in different threads.
Imagine also that both customer A and customer B selected a product that has only one unit in stock. We can end the next execution line:
- Thread A updates customer data A (step #1), checks the stock inserts the purchase record (step #2)
- Thread A is locked and B is executed
- Thread B updates customer data B (step #1), checks stock inserts purchase record (step #2)
- Thread B updates the stock, which is now zeroed
- Thread B is locked and A is executed
- Thread A updates the stock, which is now negative!
Note that although the code checks the stock the order of execution makes the check is not guaranteed in the next step.
The Solution
Transactional databases use the concept ACID:
- Atomicity: a transaction is a sequence of operations indivisible, or is executed as a whole, or all is undone.
- Consistency: at the end of the transaction, the status of the data shall be consistent.
- Isolation: although some systems allow breaking the isolation, in general an ongoing transaction cannot be accessed by other transactions so as to avoid reading an inconsistent state, a "dirt".
- Durability: in case of success (commit) the persistence of the data must be guaranteed
To ensure these concepts, in general, databases use locks when simultaneous access to the same data structure occurs. That is, if someone is already messing with the dice, the others have to wait for him to finish and wait their turn in line.
In practice
By using transactional databases, we can take advantage of this management control by the Relational Database Management Systems (DBMS).
Including the concept of ACID transaction in the previous example, let’s see how the execution:
- Thread A starts a transaction, updates customer A data (step #1), checks stock inserts purchase record (step #2)
- Thread A is locked and B is executed
- Thread B starts a transaction, but when trying to update client B data it is blocked because A’s transaction is not over
- Thread A updates the stock, which is now zeroed, and does commit in the transaction.
- Thread B is unlocked and runs
- Thread B updates customer data B (step #1), checks the stock and returns an error as it does not find the product available
- Thread B executes a rollback to undo the changes you have already made
The end result is as if only thread A had run and B never existed.
Not everything is a bed of roses
There are some problems inherent to ACID transactions, being the performance the biggest of them.
While it is important to ensure data integrity, for many systems where availability is the most critical factor, a model that blocks simultaneous access becomes unviable.
This is one of the main factors for the emergence and adoption of various nontransactional database systems and Nosql.
The important thing is to understand that the use of transactions has a cost and sometimes this can be too high. One of the most common representations of trade-off data persistence is as follows (withdrawal of this article):
The graph demonstrates that consistency, availability and partitioning (scaling the database at several nodes) are resources that affect each other. You just can’t have the best of the three, according to the theorem of CAP.
Relational databases generally sacrifice partitioning for consistency and availability, while some Nosql systems sacrifice data consistency.
Excellent! gives a more academic view on the subject, posted more practical, but your answer is also very valid!
– hernandev
@mgibsonbr "Note that an explicit rollback is not necessary for this guarantee to be met - if the system fails before the commit, when it is restarted the DBMS will take care to undo the partial actions." Not really necessary? mysql tries to do the reversal if something fails?
– Jorge B.
@Jorgeb. I refer to a "catastrophic" failure such as running out of power and turning the machine off. In this case, restarting the bank will do the reversal automatically. But in the normal course of the program, it is good practice to put an explicit rollback (even to avoid "junk" in the BD logs).
– mgibsonbr
@mgibsonbr If we have 2 transactions to be treated at the same time, in the first we are writing in table x, in the second if we try to write in table x at the same time, what happens?
– Jorge B.
@Jorgeb. That’s a subject quite a lot complex, it would be better to open a new question to that. Personally, I don’t know what happens, but I suspect that both writings will be successful, only the order of the same is undefined (i.e. one can overwrite the data of the other, if moving on the same line).
– mgibsonbr