Is there any strategy for a "de-collapsible" database?

Question

Is there any strategy for a "de-collapsible" database?

Asked 11 years, 4 months ago

Viewed 133 times

10

I’ve seen on some sites (like even Stack Exchange) a possibility to "undo" an action in the database (like in Question Edits / Answers).

I’ve been thinking about some possible solutions, but I’m afraid I’m reinventing the wheel. I would like to know if this strategy already exists, standard or not, and if it is efficient. Otherwise, I would like to know if there is any alternative.

(I’m accepting responses from relational databases and non-relationship, but would prefer a valid answer for both)

Just so we’re clear, you’re not talking about ROLLBACK of transactions, right? In SE, for example, what happens is that nothing is excluded from the bank, it is only marked as deleted. The last revision of a post is the one that counts, but all the previous ones are possible, and kept in a specific table. This is one of the possible strategies.

– bfavaretto

2014/04/06 at 02:49
@bfavaretto I’m not talking about RollbackNo, it’s the second case anyway. I had this idea but I don’t know the best way to implement it, and I was afraid of something easier to exist already and I was wanting to reinvent the wheel.

– André Leria

2014/04/06 at 02:49
1

I believe that the best solution will always depend on the structure of your bank. But as an example, some extreme options would be a) to have an delete flag (+ date and user) in each table; b) to keep a mirror of each table for data history; c) to keep a single log table for the entire database, with the data serialized in some way.

– bfavaretto

2014/04/06 at 02:58
@bfavaretto I had already thought of the options (a) and (c). I hope someone will point out to me which scenario is best -- or if you have one that stands out over the others.

– André Leria

2014/04/06 at 03:01
I have experience with option (a), but now I’m too tired to report the pros and cons in a decent response. I plan to take it easy another time, okay?

– bfavaretto

2014/04/06 at 03:03
@bfavaretto No problem, I’m on a work trip and I can only use the computer at dawn. The question was more a curiosity than a need, so there’s no need to rush

– André Leria

2014/04/06 at 03:07

Show 1 more comment

1 answer

Browser other questions tagged database historical

You are not signed in. Login or sign up in order to post.

by Miguel Angelo • **28,526** points · Answer 1 · 2014-04-06T16:41:47+00:00

What is operation undo?

The concept of undo, has its best description in the systems of versioning of files. Of the behavior variants between the different versioning systems, I will highlight some parameters that I consider important, in the construction of something similar.

These systems store all changes made to the files (records) forever, in full, or in the form of a diff compared to the previous version. This means that it is possible to rebuild any of the files in any point of time, provided that there is a record of each change made.

Each record has a Exact indication of which is the previous version of which a record may have at most one basis. This can be done in the form of a note, that is, a reference to the previous record. Therefore, two records may be based on the same previous record.

Note that this allows you to build a change tree. Undo (or even redo), means merely changing the seat HEAD, be it done in the form of a flag, or a note. HEAD is an indication of which of all the records in the tree, is the current version.

Finally, we have to consider the working unit for the undo action. In versioning systems, it is possible undo changes smaller than the work unit, that would be a commit. However, that’s not something from extreme need and you could simply use work units underage, and allow undoing only in full, but not in part.

Deciding what resources I should implement

Store in full, or only diffs

Store in full: Storing everything in one piece makes everything simpler. It is not necessary to diff the records, which is a complex task. To retrieve a record at any point, no need to rebuild from the beginning, or from a previous snapshot.
Store diffs: Storing diffs can have advantages because diffs are more compact, and therefore require less storage space.

How to know which is the previous version

Note to the previous: I recommend a note for the record previous. I think this is how most programmers would expect that was done, and therefore it is the one that has the least entry barrier to other programmers.

Implementing a HEAD indicator

Braches: There are file versioning systems that have various development braches, ie various visions of what is the version current. This requires that there is something in the system that represents a Brach, and it is this object that will tell you what your HEAD.
No branches: This is the simplest. There are no multiple views of content. Only one branch with only one HEAD. Still it could exist a branch-specific record in the database that points to HEAD, this makes the system flexible and allows it to develop for other ways, or else use a flag in a field of one of the tree elements... what else of inflexible, I think it would be more complicated, mainly to ensure that there is only one record of the tree that has the flag on.

About the work unit

The working unit is each node of the tree. There are two ways to undo, one that allows you to undo minor changes than the working unit, and another which allows you to undo only entire working units.

Undo entire units: this is the easiest way to implement. Only requires the HEAD appointment to be changed. To allow actions Further granular grinding, smaller working units can be used. For example, if the system is composed of updated POSTS by multiple users, like Stackoverflow, the working unit can be an entire post, which is the if, or if it was necessary to undo only tags independent of the content, there could be two trees of amendments: one for content and the other for tags, ie two more granular undo systems.
Partially undo units: this way of undoing, can’t be called well so, because it requires a MERGE and therefore a new record of change. Using the previous example, where you want to undo content and tags a posting independently, if the work unit were the post, then to undo only tags up to a point past P0, a new posting object would have to be created P2, by copying the tags from the desired point P0, and indicating that the new post P2 is a MERGE between the previous P1 and the point P0:
```
Resumindo: P2 => P0 + P1 (MERGE)
```
That being said, you can see that such a system is much more difficult to implement.