Creation of a database for each user

Asked

Viewed 1,978 times

5

I’m putting together a web sales system with php and mysql, where we will have several registered companies, each company being a user who will control their sales. Our team has a question: create a banco de dados for each user, or create only one banco de dados general and the tables have the column with the user id to execute the relevant user entries and queries? Do any of the options return higher performance in the bank selects? What are the advantages and disadvantages of each option?

  • 1

    Give more details. Example, if each company is an independent store, I imagine that each store will have different products. These products are accessible by a single shopping mall or are accessible independently?

  • Yes, each customer would be an independent store, with different products, without the possibility of relationship with each other, all independently. The intention is to have several small customers, preferably if all goes well, thousands of customers.

3 answers

5

You probably won’t notice any difference in bank performance unless you own millions of customers.

Some advantages of applying this method would be:

  1. Simpler seat model;
  2. Easier to export and backup data from a specific user;
  3. When any change, problem or maintenance occurs only the users involved are affected;
  4. This way it is easier to divert resources if a specific user needs it;
  5. Easy bank expansion, since it is simple to separate some users and put in a second server. This is also more economical, being that it is cheaper to buy new servers than to upgrade to an existing one in most cases.

Disadvantages:

  1. If you have any changes that need to be made on all users it will be quite complicated.
  2. Considerably complicated to create and maintain, mainly to leave organized, versioning of each database, change log...

The list of advantages may be longer, but in most cases it is not worth the headache that this can cause, I advise to think well before implementing.

For more information visit that article by Microsoft on the subject.

4


Depends on scale or volume

It all depends on the scale of the product. When the scale increases, many design decisions are counter-intuitive.

By scale I don’t just mean the volume of data. And scale and volume depend on context.

From a performance standpoint, a reasonable volume would be perhaps in the millions of records. I have worked on systems that processed hundreds of thousands of records without any difficulty, obviously with the proper optimizations. If each customer has their millions of records, then putting everything on the same table can be a problem.

However, from a business point of view, it depends on who the customers will be, how much the product intends to be evolved, the procedures that will be adopted for development and maintenance, having the data in a database can be totally desired or totally impossible.

The plan is to sell the system to a few companies and then the demand of each company for maintenance and evolution is expected to be great? Or it would be selling the system as a Saas to a mass of small businesses without them having more advanced access (think of the Free Market, for example)?

Problems with a shared database

Let’s think of some scenarios with a single bank:

  • You need to backup, all customers get slow because of this
  • A client unintentionally deletes data and requests the restoration of a backup (I didn’t want to be in the shoes of the person responsible for it)
  • Application errors cause one client’s data to appear to another, or worse, data is updated without a proper filter on WHERE.
  • You need to update the application and the new version changes the data structure. You necessarily need to update the system for all customers, all or nothing. It is common practice for serious companies to release first updates to a subset of customers and then, if all goes well, update others. That would be impossible.
  • Your company sells the system to different people around the world. From a moment on, it is decided that in order to improve performance in other countries or to adapt the current legislation, the system has to host the data in that country. If the database were separated, it would be enough to move the database to a virtual machine in a data center of that country and also to place an application instance there. But now you will have to extract data from a shared database and put into another void.
  • If there’s too much use for one customer, it can get in the way of the others. For example, a large client runs a routine that imports a million records. Other customers who want to read/write on the same table will be affected.
  • Generating ids and keys will be more complex and/or less intuitive, since two records with consecutive ids may belong to completely different clients.
  • All system queries and all database access methods will need an extra parameter. The domain objects and virtually the entire system would be polluted by tenant id back and forth.

Advantages of separate databases

  • Doing and restoring backups is trivial
  • Support does not go crazy to investigate a data loss/corruption problem
  • Different clients may be in different versions of the system
  • System instances can be spread around the world and moved as needed
  • Heavy usage, deadlocks, and other bank-related issues that occur in one customer do not affect the others

Difficulties in maintaining several banks

The big problem of a bank for each client is to maintain the strict organization of the changes in the database properly related to each version of the system.

Languages like PHP where it’s easy you simply overwrite a file in production should be extra careful.

First, use a migration library so your system has the ability to always update the database automatically and ensure that the structure is consistent with that version. This may give a little more work at first, but it pays (very well, by the way) over time, especially in projects that evolve constantly.

Second, always have strong control of system versions related to your code repository. For example, each version of the system that goes into production must have a corresponding tag in Git. So when any problem occurs you can easily join the database and that specific version and play any errors.

1

I would say that the greatest performance villain is the server, not exactly the way the database(s) is structured.

Regardless of the volume of data generated by each client, it is necessary to take into account the volume of accesses to the server, especially simultaneous accesses. If the server is not able to meet the demand for access, the final performance will be unsatisfactory in either case.

To give a practical example, I currently maintain a system that has about 10,000 customers and more than 35,000 orders containing on average two to three products; a single database, referencing customers, products and orders through their respective Ids; the most complex system history query, involving all tables, does not exceed 15 seconds.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.