Replication of a database

Asked

Viewed 523 times

3

I have read some articles on database replication and mirroring, but a doubt I still have, and I did not find a "concrete" answer would refer and update the database.

For example: I have MASTER and SLAVE, I will implement a new functionality in my system and need to add new tables, or update existing tables, adding new fields or removing, among other required operations.

If I update master, these table changes will be replicated for the slave as well?

Or if I did this upgrade in the slave (assuming replication is bidirectional), it would also be replicated to the master?

Today, assuming this is really the process... It is the way used to perform these updates in table fields, without the application being unavailable?

  • You will probably have to leave some software to synchronize, maybe a routine with CRONTAB. Anyway it is a very broad subject. I cannot say that this is the ideal path, nor the idea you described, nor the idea of synchronizing.

  • Yes, I understand that question.... But my doubt would be, these software will synchronize everything, beyond the data?

  • There is no software ready for this (I mean there must be), but in general using native MYSQL-client tools yourself sets what you want to sync by doing a full-backup (should have easier way) and dropping the slave.

  • I understood, it is, until now, were these the possibilities that I had seen people commenting.... Really it is a broad and complex subject. I appreciate the response, in case I find something else interesting, I will share.

  • I use Symmetricds, do all the hard work smoothly, and can use any database and still use different databases on the servers. Works very well.

1 answer

3


Each replication method has its pros and cons. As stated in the question comments, you need to identify your needs to then find the most suitable replication scheme for your application.

Below, in general lines, a transcription of the "speech" that I always use when I need to get into this subject. Focus on Postgresql, which is the solution I most use daily and with which I have the most affinity.

One-way replication

Postgresql natively offers only the unidirectional replication method, where there is a server master and one or more servers slave in standby. The servers in standby can be used for queries, but not for writing, being called in this case hot standby. In this replication mode, all changes made to the master will also be done automatically on all servers slave.

The most basic replication method, which does not support darlings read on the servers standby is called log Shipping, or Warm standby. In this method, Postgres transaction log segments (WAL- write-Ahead log), of 16MB each, are sent over the network to the replica servers and run in cyclic process. Given the time to "fill" a segment of these, this replication method represents a certain slowness in the replication process, being recommended for low data volumes or for circumstances where it is not necessary that the replication timing is so up to date.

Beyond the log Shipping, Postgresql has a mode called streaming Replication, where a constant stream of changes is sent directly to the server slave through persistent connection. This allows constant checking of the integrity of the applied data and thus the referred hot standby.

Replication in hot standby can be configured as both asynchronous (standard), where data may take some time to be written to servers slave, how much synchronous, in which the COMMIT user is only considered fully satisfied when it is also performed on the server slave. It is possible to set up a mixed environment of both synchronous and asynchronous replicas, asynchronous cascading etc.

While the use of synchronous replication for high availability purposes is tempting, one should keep in mind the relationship between network transfer capacity and the volume of replicated data. If your network is slow, change commands executed on master will take time to be COMMITand the bench will appear "locked". Good performance synchronous replication architectures tend to be specified with the use of cables cross neighboring machines, without any other network intermediary assets, to avoid bottlenecks. Even so, large volumes of trafficked data tend to make this level of immediacy unfeasible. The infrastructure cost therefore increases according to the transaction volume of the application until it becomes unviable in high-volume data frames.

That being said, replication settings that use the hot standby represent the intermediate world: continuous replication asynchronous, but the volume of traffic data allows updates to happen almost synchronously (near-Sync), which appears sufficient for most uses without risk of interrupting the synchronous method.


Bidirectional replication

There is no native bi-directional replication solution in Postgresql, and it is necessary to install plugins that offer such functionality. The most stable and most used currently is called BDR, developed by 2nd Quadrant. It is understood that at some stage such functionalities will be incorporated by the Postgresql code base, but for now it is necessary to install the plugin BDR apart.

As the name implies, bidirectional replication allows changes to be made to all the servers involved, with a replication method multimaster. Simultaneous changes of the same object at different nodes can cause conflicts to be resolved by plugin, and such a resolution process can be costly. This means that the application must be prepared and scaled to avoid such types of conflict as far as possible.

BDR replication is inherently logical, offering the configuration functionality of the objects to be replicated. In this way, temporary tables and other less important data can be left out of replication, which can reduce the cost of conflict resolution and the processing time of the changes by the nodes that receive the data.

Another characteristic of BDR is that it is inherently asynchronous: the COMMIT is returned to the user by the node in which he commanded the change, and the other nodes will have the transaction applied some time in the future. BDR replication uses the same mechanism of streaming Replication above, therefore tending to result, when there is no need to resolve conflicts, in times of replication considered near-Sync.

The creation and configuration of bidirectional nodes, maintenance and quality monitoring of replication tend to require more attention and work than unidirectional replication methods. This is something to take into account in choosing the BDR.


Balancing, parallelism and connection pool

There are other working methods that can solve problems without having to resort to replication. Tools between the application and Postgresql, such as Pgbouncer, allow the division of work between more than one database server up to the level of the command itself, enabling logical partitioning of data between nodes. The Pgpool-II allows the formation of pools connection between nodes, load balancing and parallel execution of darlings on more than one server. It can also be used as an intermediary agent in schemas failover automatic.

Such technologies can be used alone or in conjunction with replication methods. Sometimes what your app needs is just a Pgpool-II sending the same darlings for two different banks: the time of execution on both sides may differ in a matter of milliseconds, these are different transactions, but the data exists on both sides.

In addition to these two tools there are others capable of this type of juggling, each with its specific functionalities.


Use cases

As seen above, each replication method offers a different "cost-benefit" ratio for the system. Before adopting a solution, one should keep in mind the specific needs of the application. It follows some typical tables.

  • Do you replicate to ensure availability at times of change? Maybe make a failover to a knot hot standby asynchronous in moments of lower transaction flow is sufficient. If there is a time when the volume of transactions is zero for longer, even one Warm standby would do without the advantage of having a separate node only for queries.
  • The problem is to ensure that no data is lost in case of disaster? Well, if absolutely No data can be lost, perhaps your system is a candidate for synchronous replication, but this can impact the speed at which bank transactions are handled. Ironically, if full synchronism does not impact your application, perhaps asynchronous replication will also meet without problems!
  • Is your main goal to divide customer demand between multiple servers in order to serve more quickly? This is a two-way replication job! Remembering that in a centralized environment tends to be rare the need for more than one writing server, and it should not be forgotten that the overhead maintenance tends to be higher than in other modes of replication.

It is also always good to emphasize the importance of effecting backups journals of your data, kept apart from your production systems, for recovery in case of major disasters. Replication can be understood as a online backup, but it does not protect from human flaws, as a DROP or TRUNCATE done at the wrong time!


References

Here are some references about replication with Postgresql:

  • Show ball @nunks.lol , I managed to remedy my doubts with your explanation, I’ll be taking a look at the references you made available! I am grateful.

  • @Marcelolx is struggling to help. I’m probably going to edit that answer slowly, improving here and there, so I can point out to people when they ask me about it at work too, haha. Just to point out that everything I said is for Postgresql, I do not know the solutions of replication in Mysql to be able to speak with property.

  • Yes, I was also looking for a solution related to Postgresql...I only included Mysql in the tags, in case someone knows an interesting solution for it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.