Conceptual database and file backup strategy

Question

Conceptual database and file backup strategy

Asked 11 years, 7 months ago

Viewed 1,358 times

10

This is a conceptual question that fits into two on-topic themes of Software architecture and engineering. Concepts and practice and database question involving SQL on server. As this type of question tends to be too broad, the suggested approach and specific situations to be considered restrict it to something more tangible.

Suggested approach

The answer is conceptual, not practical. You can use a tool as an example for a concept, but the focus is the concept.
One should avoid relying heavily on personal opinion to the detriment of recognised methodologies or authors. OK if you are a DBA with decades of experience
It is not necessary to explain issues like application having to work in high availability, however during backup the application cannot stop without justification. Estimated restoration time should be cited
One should give a notion, even if vague, of the periodicity of the backup. How many in how many hours?
Should explain which strategy to erase old backups. Consider restoration can be requested for recent event or event 7 or more days ago.
Should succinct, in a few paragraphs or a list of arguments come up with concepts that you believe every developer should know. Imagine as an explanation to a new developer or a non-technical supervisor who will pay for your hardware

Specific situations to be considered

Consider a typical web application. It has any SQL database, and the application runs in any language, which stores user files in a standard file system. To avoid the question being closed as too broad, it is proposed to address two situations below in the answer:

Situation A

The database is small, can be extracted quickly, and you can store unlimited backups
The amount of files is small, can be compressed quickly, and you have hardware to store up to 60x
The hardware is not extremely reliable: it is a new desktop or server hardware machine with more than 4 years of use. There is no RAID redundancy.
There may be human failure

Situation B

The database is huge, and extraction without due care locks the application. You have room for at most 5x backups
Amount of files is great, compressing into the file server is stressful. You have backups for 2x~5x backups
The hardware is extremely reliable: hardware redundancy and server-specific hardware machines
There may be human failure

Despite all the warnings about "this is a conceptual question", it remains the problem. I suggest in such questions to base at least with Wikipedia. Example: in the English article about Backups the relevance of defining the backup goal (RPO, RTO or security). Another much discussed point is architecture, because different data (eg. different schemes of the base) may have different strategies and objectives.

– Peter Krauss

2014/02/24 at 01:28

4 answers

Browser other questions tagged database file-system backup

You are not signed in. Login or sign up in order to post.

by J. Bruni • **8,142** points · Answer 1 · 2014-02-16T15:33:55+00:00

Conceptual database and file backup strategy

I think there are different strategies that are better for different cases. It depends a lot on the context. I’ll enjoy learning other strategies here.

The strategy I present has been successfully applied in practice by me for many years, in several cases that fit the "Situation A" - typical small-scale web application.

And the whole strategy is very simple. I will illustrate with the tools that I effectively use, but it is easy to extract the concepts.

To the source code files, use a version control system (Git). The website live is a checkout (Working Tree) of a particular branch ("Production"). The deploy is done through push remote in another branch ("master") followed by merge (manual or automatic by means of a hook). As for the backup, I have another clone from the repository on a remote system, which is kept up to date through a update daily, via cronjob.

To the files that are not source code, for example a folder of uploads, whose content, in general, should not be included within a repository, I do the backup using a file synchronization tool (rsync). Likewise, the Sync is a daily task, scheduled via cronjob.

Finally, to the database I use a tool to generate dump (mysqldump). Also from the remote system there is another scheduled daily task, via cronjob, to generate and compress the dump.

So I always have, for every application:

a mirror repository (with all the source code and its history)
synchronized files (mirroring the system in production)
daily snapshots of the database

All this accomplished without doing nothingness special in the production system. Maybe a configuration in Mysql to accept remote access from a specific IP address - anyway, something you need to adjust. The "concept" here is that this "responsibility" is of the remote system:

the backup files are in it
daily scheduled tasks are on it
"git update", "rsync" and "mysqldump" run from it, meaning:
obtaining the source code, files and dump are "obtained from" of the external system, and "sent to" the external system

In my case, of small applications and systems, the interval of one day for the backups is perfectly acceptable. Even more so considering:

In the case of source code, there will always be some (s) other (s) copy(s) in another(s) place(s), because after all there is the developer’s work in progress.

In the case of upload, in general who made the upload still have the file the next day - even more if it is important. (Detail: use rsync option which does not erase mirror files if removed at source.)

In the case of the database, in addition to backup, I also leave activated the log binary - Feature of Mysql that records all operations (darlings) changing data, including the date and time of the change. By merging the daily backup with the binary log, I can restore the database to any exact moment in time. I had to do it once ("rewind the BD to a point in time") and was successful.

To delete old backups of the databases, I have nothing automated. I myself have been doing the maintenance manually. But this is not recommended. The best strategy here is to simply automate this process as well. For example: from the daily backups, keep one per month only, if it is prior to 30 days, up to a maximum of 6 or 12 months ago.

The repository stores all the history, like any source repository. And the file mirror has no history - it’s a mirror (although it keeps deleted files).

Finally, for the binary database log, Mysql offers size and time limit settings for the database. If you didn’t offer, it would be the case to set up a logrotate for the same.

Anyway, it all really depends on the context, the scale, and the tools available. The strategy outlined above, based on practice, has as main feature the fact that everything is done from the outside: I do not take care of the "backup" in the system that suffers the "backup", but in the external system that makes and stores the backups. I have a VPS for that, which takes care of the backup of various applications out there. This VPS, in turn, relies on the backup provided by the provider (in this case, daily snapshots). I think it’s good coverage.

Of course, in the case of a larger database or application or with "high-speed sensitive data" (like a heavy-duty eCommerce) - here we are in another universe, and another strategy is needed. We can think of BD replication, physical replications in palpable Hds (RAID) - something that’s not so "in the clouds" as.

by mgibsonbr • **80,631** points · Answer 2 · 2014-02-17T17:02:47+00:00

It is difficult to give a conceptual answer because it varies a lot with the implementation. Examples:

A strategy for backup databases without loss of availability would be replication of your BD in master-slave format, so that when you wanted to back up you would pause replication, back up on slave and then resume replication. This strategy is well supported in Mysql, but presents problems in Postgresql (for example; other Dbms may have other peculiarities).
- Space requirement: minimum 3x (one for the master, one for the slave and one for backup in progress - before moving it to external storage)
- Backup periodicity: during the time replication is paused, the master will continue to accumulate writings that have to be replicated later on slave. It is the sum of these two periods (backup time, replication time) that determines the period minimum between one backup and another.
- Restore time: it will depend on how the backup is actually done (more on that later); the bank would have to stay in read-only state until the restore is done.
Another strategy without loss of availability (this time specific pro Postgresql, but others may support too, not to my knowledge) would be the continuous filing: use the DBMS log itself (here called WAL - Write Ahead Log) both for replication (if necessary) and to perform backups continuously. This log saves all entries made in the database since the last one checkpoint, allowing to restore the bank to an arbitrary previous state.
- Space requirement: I have no practical experience to comment on, but the documentation says that "backups tend to get quite large"... That is, use this strategy between a checkpoint and another may be feasible, but these checkpoints must exist anyway. Then, falls into the problem mentioned in the above item.
- Backup periodicity: The incremental backup is done continuously, so that for short periods (which is "short" it will depend on the write frequency in your specific application) you can restore the database to any arbitrary moment in the past. The backup of checkpoints, on the other hand, suffers from similar restrictions to the above item.
- Restoration time: I don’t have enough knowledge to estimate.

As for the backup itself, the strategy is different depending on the scale: small banks can be archived by a simple operation of dump, while larger banks require the backup of the data files themselves (since the operation of dump is prohibitively time consuming). Again, I do not have enough knowledge to estimate the restore time according to the data volume, but it should be noted that it is always possible to have a ready replica in the "last valid state" that could quickly assume the state of master in the event of a "catastrophic situation".

Archives

Everything will depend on the format of these files and their purpose. Examples:

Predominantly textual files, which form the basis of the site itself

In this case the use of a version control would be the most appropriate, as already pointed by @J. Bruni. In this case, the changes themselves would be made in a different environment, and only the final results promoted for the production environment. That is, the backup is not done on the application server, but elsewhere.
User uploads, in textual or binary format, that are not modified after loading

The strategy that would optimize use of space in this case would be to have the backup folders reflecting the same server folder structure [at the time of backup], and each file making a link (preferably hardlink) for the real file. Optionally, this can be preceded by a de-deduplication, in order to save space. That way, deleting an old backup is a simple matter of deleting your folder and everything in it - since the hardlink ensures that a file will only be deleted in fact when no other backup folder references it.

(Note: in the case of server without reliable storage, it may be interesting to make a "real" copy of each file, and only then perform the links. The project for example uses this strategy, including de-duplication. In addition, it may be more viable at the time of moving files to external storage - in which case the solution by hardlinks presents problems)
Binary or text files used and modified by the application itself

In this case, it is difficult to give a generic solution, except maybe operate in the mode copy-on-write ("copy when type") - to create a snapshot from your server without having to stop the application. This is supported on Windows through the functionality Shadow Copy, and on Linux through the filesystem Btrfs. Once the snapshot, the backup would proceed as per items 1 or 2 above, depending on the data format.

by BrenoZan • **601** points · Answer 3 · 2014-02-24T19:47:39+00:00

Like infrastructure analyst and not DBA I will suggest solutions bank structural backup which in general are used in real disasters, solutions which I have already successfully implemented many times, in most of these solutions a base recovery process is required, because sometimes the writings of some points are half in these solutions and because it is a structural backup is ALWAYS necessary a second machine

First case

no secret, scheduled backup routines on a monitored remote server. (Deduplication tends to infinite backups in this case)

Second case

Consider a filesystem with accessible snapshots and/or storages with this function, every hour a snapshot is scheduled and the backup process performed on another server mounting the snapshot as readonly, there will still be performance loss by I/O, but this is the best solution to not burden the server continuously, not all bases accept this type of solution simply, but it is always possible to scriptize the steps safely.
A second server receiving a continuous copy of the data for backup purposes only, on serious bases with synchronous replication the performance of the structure can end up being leveled by the mirror hardware and the synchronicity the backup process on this machine can affect the writing speed of the production machine, but the data store can be done very quickly and virtually without data loss compared to the previous solution.
The same as the previous, however with the logs sent asynchronously, there is only an overhead of sending the binary logs to be replicated, there is almost no interference from the mirror machine on the primary server, and backups can be made almost without effect on the main machine, but there will be small losses in the Store
A variation of the previous one, however the mirror machine containing a filesystem that carries snapshot without lock, if possible deduplication and access without restoring the snapshots like zfs and gpfs ,filesystems of dreams free and paid (very well paid!) respectively, in this case there is no need for the backup routine itself, the volume of the data is only incremental, there is little lock on these actions that can affect the production base and the scheduling of the backups is simple. but the structure itself is very complex and quite susceptible to human error
Local snapshots being sent by network to a remote machine and eliminated in sequence, I only know the ZFS with this capacity, there will be data losses and the network traffic is quite large, however can be done minute by minute, the target machine need not be able to rotate the base.

**I didn’t quote LVM to snapshot intentionally, it has to be safe, and LVM’s Recover is the worst of an admin’s nightmares

by user4552 · Answer 4 · 2014-02-16T23:21:52+00:00

0

For situation B it is necessary to answer the question: "How many days can your application stay off the air for the restoration of a back-up?".
I believe that the biggest problem is the restoration and not the creation of a back-up.

See this post by Fabio Telles: "Dump is not backup!"
http://savepoint.blog.br/dump-nao-e-backup/

3

"days"? If you count the time to restore in days, something is wrong there.

– Guilherme Bernal

2014/02/16 at 23:42
1

@user2478690 can please improve the question? The way it is will only attract -1s. Not only did you not answer the question but you just passed the external link

– Emerson Rocha

2014/02/16 at 23:52
1

Situation B is from a massive database. For a huge bank the traditional back-up and restoration strategies (which is the big problem) usually do not meet the availability requirements. For such bank types it is advisable to replicate, e.g. in master-slave form, and to promote a slave to master in case of problems. For an effectively large bank the restoration time comes to be prohibitive in terms of the requirements of the system.

– user4552

2014/02/17 at 02:47
I think @user2478690 is wanting to remember that the concept of "backup" nowadays (2014) is broader, and that precisely RAID, replication, etc. enter the broad concept. Store front, banking applications, etc. take damage with a few seconds or minutes of unavailability, and further damage if they lose "transactions in transit".

– Peter Krauss

2014/02/24 at 01:21