What is the advantage of using one database for reading and another for writing?

Asked

Viewed 422 times

11

What is the advantage/difference of using separate databases, one for reading and the other for writing?

In my view, there is no such concept. The writing bank, one hour will have to be read to replicate the data in the reading, which in turn will suffer writing to receive replication.

Until I understand that this replication could occur at a time where the bank/ system is not being used as much as at dawn for example, but in my case, I can not afford this luxury, the synchronization/ replication between the banks has to happen in a maximum interval of 5 minutes.

I have a database today that suffers thousands of insertions per minute, so if every five minutes I run a synchronization process between the banks, I believe it will be millions of lines, and in that moment, my reading bank would lose performance because it would be suffering with the writing of the replication.

I am considering that this replication would be done in the "hand" through a service developed by me as if it were a queue, or if using the existing replication resources in Sgdbs, then it would be more advantageous/performative?

2 answers

4

Perks

  • Higher scalability (non-performance)
  • Greater reliability in infrastructure

Detailing

This is usually not quite how it is done. There is a server that receives the writings and eventually can receive readings as well. There are other servers that only receive readings (the gain starts to be interesting when there are some).

Usually you want and can have faster readings doing this separation. In most scenarios it is complicated to have big gains in writing, it is complicated to separate. Of course a server that has priority for writing is a help.

It is obvious whether the read servers will be written, but they are lighter because the processing needed to write correctly has already been done on the master server, so it does not usually weigh as much.

Of course each case is different. That’s why this n]ao is a magic solution. There are situations that can bring gains that compensate for the increased complexity.

In general replication is done in near real time (below 1ms difference, in some cases synchronously where the data is only validated when all slaves are up to date). Most scenarios this is important.

It takes great care to do this kind of operation. A lot can go wrong.

Separation gives no more performance, even eats a little of it, just allows you to scale more. You will not be able to have 3X performance because you have a server with writing and another with reading, it is more likely that you have 1.5X with this configuration.

Greater reliability is obtained since having more than one slave server can take over the functions if the master stops, or the master can continue running everything if a slave stops.

Also if one of them stops has the same data in the other, so there is no loss.

3


There are some points that make a lot of difference:

  • Read performance: If I have more machines available(read replicas) do you agree that the readings will be faster? Distributed load balancing between replicates.

  • High Availability: Imagine if I only have one database instance (read, write), for some reason the instance to work, your entire application that depended on the database also stops working. If I have database replicates (be they just read replicates) my application does not suffer an "outage", it keeps working, even if it only has basic features (reads in the database). High availability is a very important point for your business.

  • Durability of Data: If I have N read replicas, then I can "lose" the data of an instance and quickly recover it (without taking a last-minute backup) because there are N replicates available with the data.

So:

I ask that, because in my view, there is no such concept. The writing bank, one hour will have to be read to replicate the data in the reading, which in turn will suffer writing to receive replication.

Until I understand that this replication could occur at a time where the bank/ system is not being used as much as at dawn for example, but in my case, I can not afford this luxury, the synchronization/ replication between the banks has to happen in a maximum interval of 5 minutes.

I have a database today that suffers thousands of insertions per minute, so if every five minutes I run a synchronization process between the banks, I believe it will be millions of lines, and in that moment, my reading bank would lose performance because it would be suffering with the writing of the replication.

I believe that this is not the case. Cluster managers have a "smarter" copying process to not "stress" your database to the limit. I think here is worth a deeper research to better understand the functioning.

One last comment, I am considering that this replication would be done in the "hand" through a service developed by me as if it were a queue, or if using the existing replication resources in the Sgdbs, then it would be more advantageous/performative?

I believe you don’t want to "write in hand" a replication tool. Leave this work to cluster management tools, for example: Galera Cluster(Mysql Cluster) or even the option of Read Replica of AWS RDS.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.