Database and transaction control in value transactions

Asked

Viewed 1,574 times

13

In my project I need to develop a system of transactions where the user will put credit and the services he hired will consume these credits.

I built the following structure but I don’t know if it’s the best way to do it:

Usuario
id | Nome

Transacoes
id | usuario_id | valor | data

When the user enters credits I will feed the table transacoes; when there is an automatic debit by the use of the services I will feed the table transacoes with negative values (for example, -12 of monthly value); when I was going to generate the user’s cash (balance) I will account for all the amounts to arrive at the current balance.

The problem I found is that as this table grows, the performance to generate the balance will decrease.

Anyway, I can’t think of a structure to cover this business rule without stress :(

  • 2

    In the user table I have a balance field, which is updated every time I enter, update or delete a record in the transaction table. If you use a database transaction to do both updates you prevent the values from going out of sync.

  • Another option is to place a balance column in the transaction table. It will seem redundant but it will help you in the future to take extracts from periods past.

  • 3

    In two others answers I show a pattern that I have already successfully used to represent transactions and balances. In case my problem was another, but if you are worried about the table of transactions grow too much (tip: you are not sure yet if the performance will even suffer or not, beware of premature optimizations) this separation between transactions and balances would allow you to move past transactions (say, from 1 year ago) to a "dead file" without compromising the integrity of your data.

  • The two answers @mgibsonbr quoted are great: read!

  • still, could work transactions in a Big Data environment, or, partition the table (with a filter to the dead file), create a materialized view, etc.

4 answers

8


Let’s say that building the balance from reading all transactions is in fact a Constraint for performance (you don’t know yet, but let’s say you already know), you can create a table to maintain the balance (a record only for each user):

Usuário
id | Nome

Transação
id | usuario_id | valor | data

Usuário_Saldo
id | usuario_id | saldo_atual

For each transaction (credit or debit in the Transaction table), you update the user balance in the table Usuário_Saldo, so you will have to read a single record when you want to check a user’s balance.

So far, obviously. The precaution you need to take is to ensure data integrity. If you first create the record in the Transaction table, when calculating the balance it may already have been changed by another Credit or Debit Transaction.

Then you need to first update the balance, then launch the credit or debit transaction. Of course, these two commands must be an atomic operation, that is, they must be within the same database transaction. So:

-- abre transação de banco de dados
UPDATE Usuário_Saldo
SET saldo_atual = saldo_atual + @valor_transação
WHERE usuario_id = @usuario_id;

INSERT INTO Transação (usuario_id, valor, data)
VALUES (@usuario_id, @valor_transação, @data)
-- commita transação

If the system sends "at the same time" two credit or debit transactions, the second one will have to wait for the first one to be completed because the Usuário_Saldo will be blocked.

Update: and if I want to ensure that there is no debt greater than the available balance?

The solution described above ensures the integrity of the balance without having to worry about the level of isolation of transactions. But if I add a SELECT before updating the balance in order to first check if there is a balance available for a new debt, I would break the solution because the select could read a previous version of the balance, or a version not yet committed, depending on transaction isolation level.

In this solution, then, the available balance lastly, causing a rollback if the operation has resulted in a negative balance. Just add a balance check select before the end of the transaction. The complete solution is like this:

-- abre a transação de banco de dados

-- atualiza o saldo
UPDATE Usuário_Saldo
SET saldo_atual = saldo_atual + @valor_transação
WHERE usuario_id = @usuario_id;

-- registra a operação de débito/crédito
INSERT INTO Transação (usuario_id, valor, data)
VALUES (@usuario_id, @valor_transação, @data);

-- verifica se a operação resultou em um saldo negativo
SELECT saldo_atual
FROM Usuário_Saldo
WHERE usuario_id = @usuario_id

-- se saldo_atual < 0, faz rollback
-- senão, commita a transação

This is a simple solution by not worrying about transaction isolation level (which depends on database settings or additional directives when executing SQL commands); and it is performative by allowing the lowest level of isolation, because the higher the isolation level, the more resources are used by the database server and the less competitive the query operations.

More complex solutions can emerge depending on specific needs. For example: if I carry out the entire operation before I knew that the balance was insufficient, I can check the balance in advance and not trigger the update if it is insufficient - This decreases the occurrences of trying debts that will not be completed successfully. Of course I will not rely on this prior check to secure my balance - the final check that determines whether the transaction can be committed remains.

Obs 1: These commands are considering auto-increment id in tables. Obs 2: "@" is symbolizing the values passed by parameter to the query. Obs 3: This code considers that User_balance already has a record for each user, but you can have variations of it.

  • 1

    It’s an interesting approach, +1. I’m only half in doubt if the record gets even blocked, or whether this depends on the level of isolation and/or lock bank (I don’t have much experience in this area). This article on Wikipedia for example suggests - if I understood correctly - that at the level "read uncommited" the second transaction nay would be blocked - and could write an incorrect value if the first suffered rollback. That’s right?

  • 2

    It is locked yes, @mgibsonbr, regardless of the isolation level. The Isolation level determines how a transacted record can be read but does not determine how it can be changed. In a transaction-controlled RDBMS there is only one way to change a record: it cannot be changed by another pending transaction. Notice if I did select saldo_atual to do so update...saldo_atual = @saldo_atual + @valor_transacao then yes I would be in trouble, because SELECT could well select a "dirty" value depending on the level of isolation of the transaction.

  • "Standard" solution for a physical model.

  • 1

    @mgibsonbr I just read the wikipedia article you referenced. This article commits a serious mistake when saying that "if a lock is released immediately after the [Insert, update, delete] command, the modified data can be changed by another transaction before this transaction is committed". This nay is the behavior predicted by ISO and no reputable DBMS behaves like this. Isolation level determines only how records readings for this transaction can be modified by another transaction, and determines how this transaction will read the records modified by another.

  • 1

    @mgibsonbr Two references in English, from MS, that explain well the ISO standard for Isolation level (http://technet.microsoft.com/en-us/library/ms189122(v=SQL.105). aspx) and the details of the implementation of SQL Server for these standards (http://technet.microsoft.com/en-us/library/ms173763(v=sql.105). aspx).

  • Hehe if I had passed the mouse over the "[Citation needed]"... :P

  • 2

    I would add the following condition in the UPDATE WHERE: current balance + @transaction >= 0. If @@ROWCOUNT < 1 already rollback

  • 2

    @Akirayamamoto Great option too. + 1 in your comment :-) The advantage of SELECT is that in a single command you identify that it has gone negative and you can infer how much was available, so you have information like "Could not debit @transaction value because the available balance is current balance + (@transaction value * -1)". Another option is to make this SELECT right after UPDATE thus avoiding the INSERT cost.

  • Only one add-on, it is possible to check yes the balance at the beginning of the transaction, before doing an update (and consequently load more System resources) unused row versioning. The size of the tables increases a bit, but the added security in competition is amazing. It could be put in the Usuario_Saldo only, it would be enough.

  • @rodrigogq I don’t understand. Do you refer to the snapshot feature of the record, present in Oracle, MS SQL and other databases? Reading the latest commited version of the record does not prevent it from being changed by another transaction between SELECT and UPDATE; and even if the balance is not changed after SELECT, you may already be reading an outdated balance just by reading a snapshot.

  • I’m actually talking about creating an extra column known as timestamp or rowverison. When there is a competition scenario that the first wins and the second loses (excellent for financial transactions in a current account, avoiding wrong duplications of transactions) you include in your Where update the value of the timestamp of the value obtained in Select. If the value is different from when you took it, your change will have no effect. Only the first one you commit. This ensures that only the latest version can commit, along with a correct transaction.

  • @rodrigogq I get it. You refer to the optimistic competition control method. You don’t even need a version column, you can do a Where for the same balance obtained: UPDATE... SET saldo_atual = saldo_atual + @valor_transação WHERE usuario_id = @usuario_id AND saldo_atual = @saldo_atual, being @saldo_atual the one that was retrieved in select before update. It’s an option too. I still prefer the approach I suggested where a second transaction rather than fail just waits for the first to be completed.

Show 7 more comments

4

There are several approaches, it depends a lot on the scenario you are thinking of, the hardware/software investment and the speed you want to achieve.

The transaction logging model you described is not bad. There are many systems today based on it. It’s called Event Sourcing. http://msdn.microsoft.com/en-us/library/dn589792.aspx

The Event Sourcing is a technique that you go saving the events one by one from the first, and to know the current result (the balance for example), you go adding record by record.

There is a big advantage in it: in relational databases, you have a very short response time to enter new records and can greatly increase the system’s parallelization capacity by adding new hardware/software. There are also techniques for you materialize a version of the current balance and speed up the process of consulting the balance. This is very important, because there will probably be a limit to which a person cannot be charged if he does not have a minimum value (what will need to make a consultation of the balance and, for this, build the entire history from 0 if there is no materialization of these consultations).

Even so, if you are dealing with monetary values, it may not be legal to use the Event Sourcing. It’s interesting you give a lock in a balance record when you start a transaction to decrease the balance. If the processing is parallel, your system may have to deal with the competition a lot and either lose information (try to decrease the balance while someone is already decreasing) or integrity issues.

It is important to have a historical record yes, but having a balance table is important to perform the lock. So base your queries on the balance table.

Already when saving transactions, are they often consulted? Do you really need to have an id on them? Remember that the id is also a Constraint, making the database need to verify the integrity of each input information. Sometimes it is more worthwhile to have a table without a primary key, just indexed in a simple way, than to have a key that decreases the speed of the inserts. But it is worth noting that this is true when consultations are not very frequent.

To decrease this query, it is worth giving restrictions to users. It is worth partitioning tables into historical. SQL server allows partitioning tables from the data of a column. Logically speaking, it is a single table. Physically speaking, they’re separate files. You could make a partition every 3 months of history and, if the user wants to query more, he needs to schedule it in a batch processing ( which will consume more resources from the bank, but it will surely speed up his queries by the most current history). Hence you can index only recent transactions, the others stay there without much need to consume bank resources.

When performing the operations to update the balance, lock the balance record. This will ensure that the operation is atomic and, when inserted into the historical table, is much faster. When making this transaction, be sure to give a lock in the correct records, not letting the bank give a lock in a large table region.

My suggestion then is equal to @Caffé, with modification of the transaction id:

 Usuario
 id | Nome 

 Transacoes
 data | usuario_id | valor (sem pk)

 Usuário_Saldo
 id | usuario_id | saldo_atual

2

I read excellent answers. Just to show another approach I would like to suggest the following:

I would model this way:

Usuário
id | Nome

Transação
id | usuario_id | valor | data | data_processamento

Consolidado
competencia | usuario_id | saldo_atual

What changes? I find it more interesting to work on a cash flow-style solution: I have one current balance, entrances and exits. There every other day of the month I would perform a routine that would consolidate the balance of current month.

In addition, I included a field data_processing in the entity Transaction where it would be possible to know when in fact such a transaction was consolidated.

With the above approach I get the following benefits:

  • A smaller mass of data to work with, as I would only work with information of 1 month.
  • It’s much easier to look at the past, because nothing will be lost.
  • If database size is a problem it would be simple to create a routine to archive very old values (type 5 years)
  • As the consolidated would be the responsibility of a job, it would not be too difficult to control transactions.

Now as not everything is flowers, I think some disadvantages would be

  • Your logic would get a little more complex (I say this by imagining a basic system, I don’t know what exactly your system is about)
  • I can take advantage of the above parenthesis to complement that it is likely that this solution is not exactly what you seek.

Well, I hope it helps.

  • +1 In a system I developed I separated what was provisioned (e.g., if I wrote a check, for all intents and purposes I no longer have that money) from what was recorded during bank conciliation (e.g., the check was cashed/cleared, and this appeared on my statement). Thus, "normal" users only recorded transactions (handling only a table) and a "special" user was the only one who touched the balance table - because it was only updated when there was an attached bank report, proving the balance. And yes, the logic was a little more complex... P

  • @mgibsonbr But there’s nowhere to run not expensive: The complication will appear from somewhere. I prefer to make the data model simple and bring complexity to my system. Of course my solution is very generic, I can’t guarantee it will suit the problem well as a whole.

1

In transaction you can put valor acumulado which will always be the valor acumulado previous + valor current, the first being valor acumulado is the very first valor. When you access the balance, you access the last valor acumulado without having to perform operations.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.