Many tables with little data or few tables with lots of data?

Asked

Viewed 1,779 times

1

I have a bank that will receive a lot of data. I am in doubt whether it is better to organize the data into a few tables with a lot of data or further divide the data into several tables. In which case will the queries be faster? I’m using Mysql.

  • It depends on the modeling of the information, I believe that this already optimizes a lot.

  • depends.. Voce will break a table into several smaller and equal or will remodel the project?

  • @William, I don’t think any of the alternatives apply to my case. But finally, the doubt in which of the cases would be faster the search, independent of the project, considering a generic bank

  • Let’s suppose I still go model the bank

  • Hmmmm... if you already know where to look, then having few data is quicker, but this only starts to make a difference in the 10 6 records... Ideally, you have the number of tables that leave the system normalized... if the query is getting slow you can pass the database to cache or apply the Shard architecture... I’ll summarize this in a reply

  • 2

    I voted to close. That question will lead to a lot of "I think". And in fact, whatever. Each case is a case, each case will require a different structure. I also think the question has become too broad.

Show 1 more comment

3 answers

8


In accordance with the standards of the database:

It is better to have more tables with less attributes in each, and to make association by foreign keys.

1st Normal Form: all attributes of a table must be atomic, i.e., the table must not contain repeated groups or attributes with more than one value. To leave in this normal form, it is necessary to identify the primary key of the table, identify the column(s) that has (or) repeated data and remove it(s), create a new table with the primary key to store the repeated data and, finally, create a relationship between the main table and the secondary table. For example, consider the People table below.

PESSOAS = {ID+ NOME + ENDERECO + TELEFONES}

It contains the primary key ID and the attribute PHONES is a multivariate attribute and therefore the table is not in 1FN. To leave it in 1FN, we will create a new table called PHONES that will contain PESSOA_ID as foreign key of PEOPLE and PHONE as the multivariate value that will be stored.

PESSOAS = { ID + NOME + ENDERECO }
TELEFONES = { PESSOA_ID + TELEFONE }

2nd Normal Form: First of all, to be in 2FN you need to be in 1FN. In addition, all non-key attributes in the table must depend solely on the primary key (and may not depend only on part of it). To leave it in the second normal form, you need to identify columns that are not functionally dependent on the table’s primary key and then remove that column from the main table and create a new table with that data. For example, consider the following ALUNOS_CURSOS table.

ALUNOS_CURSOS = { ID_ALUNO + ID_CURSO + NOTA + DESCRICAO_CURSO }

In this table, the attribute DESCRICAO_CURSO depends only on the primary key ID_CURSO. Thus, the table is not in 2FN. To do so, a new table called COURSES is created, whose primary key is ID_CURSO and attribute DESCRICAO, thus removing the DESCRICAO_CURSO attribute from the ALUNOS_CURSOS table.

ALUNOS_CURSOS = {ID_ALUNO + ID_CURSO + NOTA}
CURSOS = {ID_CURSO + DESCRICAO}

3rd Normal Form: To be in 3FN, you need to be in 2FN. Moreover, the non-key attributes of a table must be mutually independent and dependent solely and exclusively on the primary key (an attribute B is functionally dependent on A if, and only if, for each value of A there is only one value of B). To achieve this normal form, it is necessary to identify the columns that are functionally dependent on the other non-key columns and extract them to another table. Consider, as an example, the following EMPLOYEES table.

FUNCIONARIOS = { ID + NOME + ID_CARGO + DESCRICAO_CARGO }

The DESCRICAO_CARGO attribute depends exclusively on ID_CARGO (non-key attribute) and therefore a new table should be created with these attributes. Thus, we have the following tables:

FUNCIONARIOS = { ID + NOME + ID_CARGO }
CARGOS = { ID_CARGO + DESCRICAO }

Source dsc

2

Like everything in IT. Depends!!!

In a relational database having normalized information is a good option. Of course using indexes is critical. The staff commented even the third normal way, and I agree with them... but it’s good to know that we have more, six in all and a few more details... And it’s up to you to know how far you should go!

When it comes to DML (Data Manipulation Language, or INSERT, UPDADE, DELETE and MERGE), this can change a little... Inserting records in different tables in large volumes can be "time consuming" and I have seen practices of having a single table for input and later fragment this data in its due tables.

1

As you are working with relational model, I recommend doing the modelo entidade-relacionamento following the três formas normais and not add more tables than the model generates.

As data grow, scalability techniques can be applied.

A famous technique is to pass tables from the database on cache to make queries faster.

Another is to use the concept of Sharding and break a large table into smaller tables, distributing the load and also expediting the search. Roughly speaking, is the concept of load balancer applied to the database.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.