I suggest something modular, avoiding linking the structure to localized business rules.
As an example, I would not create 2 tables, one for CPF and one for CNPJ, even if it was a localized system. I suggest something generic:
person
id (unique)
date_created
date_updated
other_column1
other_column2
person_document
id (unique)
date_created
date_updated
person_id (relacionado com person.id)
document_id (relacionado com document.id)
data char(50)
document
id (unique)
date_created
date_updated
country_id (relacionado com country.id)
title (nome do documento)
country
id (unique)
date_created
date_updated
title (nome do país)
Many programmers think that when there are many tables the system is more complex, complicated, heavy, etc. But this is not good arguments because it does not influence performance in a relevant way. To understand, follows the milk below.
When you are making a small system in which you are sure that you will never use for something bigger and mainly international, it doesn’t make much sense to build this whole structure. But anyway it is a structure that works well in a small and local project as in a large or internationalized project. So, why not normalize within a flexible pattern that serves both cases instead of creating something "plastered" that will not be reusable?
This is one of the reasons for standardisation.
Better explaining the structure above.
We have the people table (person). In this table there is no information about what documents the person has. In the table Documents, is where the documents are registered, however, not yet linked with the table of person. In this table, all documents required by the business model will be registered.
The important thing here is not to link the structure to the business model, that is, avoid it being something localized.
In this table of documents, the various types of documents will be registered. CPF, CNPJ, RG, Driver’s License, Passport, etc.
The table person_document is where the relationships are made. In this table, only the serial number or code of the document will be registered. What identifies which document is the column document.id
.
The column data
is the type char(50)
because not all documents are numerical. There are many types of documents that include alphanumeric characters. Hence the CHAR type is more suitable.
It is also important to note that the Documents table is related to the country table. This makes it possible to use the system for a business model where it is required to know the country of origin of the document independent of the person’s nationality.
That’s just a summary. Of course it is possible to improve, make some adjustments, etc. But it does not mean that it should do so and it does not claim to be the definitive solution.
With this generic structure, the system becomes more flexible and portable. You can use the same system for use in any country without having to modify the structure. Otherwise, if you do something linked to local rules, for example, with columns or tables specific to CPF, these tables or columns will be useless for other countries and often useless even within Brazil for cases of underage registration, immigrants, tourists, refugees, undocumented persons, temporary residents, etc.
It may seem a little strange to the table country
related to document
, because it could relate country
with person
, right? The problem in this is to plaster the system by disregarding people with multiple nationalities or even if you don’t even have dual nationality, you can have documents from other countries.
Note also that in all tables of the example I added the columns date_created
and date_updated
. Many systems follow this by default. Always have a column that tells the date and time it was registered and updated.
This is a subject that deviates a little from the subject, as it is related to activity log and optimization. If you prefer, you can ignore.
In the original table of the question tb_pessoa
, there is a column pessoa_tipo
. This is also a way to "plaster" the structure. For something more flexible, the ideal would be another table to define the person’s type. I preferred not to comment because it would become something repetitive by following the same logic presented in the proposed structure. And so it goes to any other columns or tables that can make the system (plastered) localized.
Do you really need so much? Even in a small project?
Here we come back to talk about whether this is really necessary. If you just want to build a box control system of a bar of the street corner on the outskirts of a small town in the countryside that has no more than 5,000 inhabitants, why complicate with all this?
Because I repeat again what was commented on some paragraphs above that, this structure serves both for a small project and for a large project. There may be a difference in performance but it is irrelevant. After all, in a small project there will not be a large flow. Therefore, the concern with performance is irrelevant. In a larger project will require much more than that to get a good performance.
The key here is portability. You can use the same structure for diverse business models.
Obs: The above examples are purely didactic.
About nomenclature, see this link: Table and column nomenclature
I may be mistaken, but unless you link other information in the table of Cpf, or cnpj, which would be more convenient, type information of the establishment, this "normalization" is unnecessary...
– MarceloBoni
What would be feasible? Do you have some parameter to establish this? Although short already has strange thing. This thing of having a table for each type of entity is such an old thing...
– Maniero