12
I am developing a system that keeps a record of all financial transactions occurring in a bank account (simple matches). There is a redundancy in my data model: a table saldo_conta
keeps a list of dates and balances for a particular account, and another table movimentacao_financeira
maintains a list of entries and exits for a particular account. These tables are redundant, because the information of one could be computed from the data of the other (in fact, there is a small difference, as will be explained below).
However, this redundancy has proved useful in the past: due to a bug in the user interface, several releases have been logged with the wrong value. But the bug did not affect the other table, so I got two tables with historical discrepancies. Once determined which of them was correct (and after fixing the bug, of course), it was a simple matter to execute a script to recalculate the incorrect table values based on the other.
Despite this, I am not 100% convinced that maintaining this redundancy is the most correct way to do the modeling. I want my system to be robust, and I don’t know how I would make this error recovery if it wasn’t for redundancy (logs don’t keep detailed information about the releases), I think I would have to manually check each of them with bank reports attached [in image format]. Even so, I have this "preconceived notion" that it is a bad practice, only that I lack objective arguments to guide my decision.
Is it always bad to maintain redundancies in a relational data model? And if the answer is "yes", would there be an alternative way to prevent bugs that might affect the most important data? (i.e. simplify recovery in case of errors)
Updating: a little more than background in my specific case - the table movimentacao_financeira
contains a list of transactions initiated but not necessarily confirmed. An example would be the issue of a check: the money has already been "spent" (in the sense that it is already provisioned, can not be used for anything else), but while the check is not cashed it is not yet in the bank records. Already the table saldo_conta
records the balances already reconciled, that is, what appears in the statement. There is a relationship N x 1
between these tables - as several individual transactions can be confirmed by a single bank conciliation.
The redundancy is in storing the balance in this second table - since it could be computed as the sum of all transactions already confirmed until then. However, there is a semantic difference: the balance recorded in this table must be exactly the balance recorded on the bank statement - if the computed value (i.e. sum of the previous transactions) is different of what is in the extract, something is wrong, and it is necessary to draw attention to the fact (this verification is made during conciliation, but if in the future new releases are recorded in the past discrepancies can be introduced - and without means of detecting them [if the redundant value is not present]).
+1 I liked the observation about safety.
– utluiz
That’s a very good answer, but I think you’ve got me wrong about "data security". The problem is not "restoring the bank to a previous state", for this in fact the solution is periodic backups (and routinely verified, of course). What happened was that the data entered in one of the tables was incorrect, and this was only detected days later (because it took time for discrepancy between balances to become apparent). In this scenario, the question remains - how to recover this error, plus manually check each transaction.
– mgibsonbr
@mgibsonbr the question is that the data does not follow a master table, the tables have apparently the same hierarchical level, there is a master table that can serve as reference?
– hernandev
@Ernandes No, the original purpose was for one to store initiated transactions (e.g., to write a check) and another to store confirmed transactions (e.g., the check paid off). In a way, the second can be considered "master" - because after bank conciliation all (
N
) the releases that appeared in the extract are associated with the (1
) new account balance on the date of the statement. Now that I read your inquiry, I realized that in fact it is not the table which is redundant, just a single column (saldo
). I’ll edit the question to make it clearer.– mgibsonbr
exactly @mgibsonbr, as I mentioned in the answer, data that undergo mutation should preferably not be redundant, to avoid consistency problems, however it is not general, there are exceptions
– hernandev