Can it be considered a bad modeling practice to store all people (customers, employees, suppliers) in the same table?

Asked

Viewed 4,023 times

23

I am trying to design a database for a small project but I am having difficulties regarding the normalization of the tables of persons (physical and legal):

  • Client
  • Supplier
  • Professional

I thought about trying to create a single "Person" table to aggregate common fields to all these people and create additional tables for the specific fields of each type of person.

Easy for Customer and Professional (both always individuals), but when the Supplier is added in the equation the amount of specialized fields in the table "Person" increases a lot.

Pessoas normalizadas

Is this kind of normalization a bad practice? It could bring me problems later that maybe I’m not realizing at the moment?

  • I’ll let someone with more experience answer, but in my opinion you’re on the right track - including in the detail of doing the id of specialized tables also be a foreign key to people’s table. You are in fact implemented an inheritance between pessoa and cliente/profissional/fornecedor! What if a supplier can be a natural or a legal person, because it does not repeat the strategy? Create a table with only the fields common to each supplier, and another that represents fornecedor_pessoa_fisica, with foreign keys both for fornecedor how much pessoa.

  • I would like to offer an answer to your question. You can give a brief conceptual description of "Professional" in this modeling?

  • @Caffé This tool that I am creating is aimed at medical clinics. So "Professional" will store people who work in the company. Secretaries, doctors, technicians, warehouses and etc. I will also change the table name "Client" to "Patient" in order to be clearer. Coming home edit the post with these clarifications.

2 answers

25


Introducing

On the contrary. The idea of separating the activities of the entities that have business relationship with the organization is old and was used because no one had thought about it properly. This was invented decades ago when computational resources were extremely scarce and there was no experience of what worked or not. Unfortunately there are those who keep doing this.

I’m not going to talk about good practices because they don’t usually help anyone in the actual practical case, understanding and solving the problem is what’s important. A manual telling you what to do or not to do does not solve the problem. Of course only experience will allow you to really understand the problem. But I see that these good practice manuals cause more harm than good because almost always people understand them as the right thing to do and not as general ideas that may or may not be applied in the specific case.

You may have a very specific need and what I’m about to present may not apply fully. But overall you should model the data as it really is. The problem is that it’s not always easy to see it that way. For example, if you think a document is exactly how tables should be modeled, it falls into the mistake that was made in the past which is to separate by activities.

In general the normalization correct is the easiest way to go. If it is done in the right way it is easier to maintain. Among other reasons, normalization was created to make it easier for the system to change without too much trauma. Not normalize should only be chosen in the rare cases where the performance proved bad and is affecting the functioning significantly. So don’t even think about making super tables with everything you can because in theory putting everything together can make access easier.

One of the key things is not to think so much about current requirements, but about requirements that might happen. I’m not saying to implement the hypothetical future requirements, I’m saying to facilitate their implementation. And this goes against the other needs of modeling.

Your specific model

You say that customer can only be a natural person. Are you sure? If a client appears and says that he can only work if he is treated as a legal person - even if people are actually treated - for tax or other reasons, he cannot be served because the system does not allow?

There’s one thing about your model that’s separating the specialized data into related tables. Treating all as entities is the right real-world modeling. Entities can be business relationships of different natures. Why can’t a customer be a supplier? In the old systems they could not, ended up having the same entity registered twice or more times if she had different business relations.

PJ X PF

Your modeling has data that is not normalized. Legal and physical persons have unrelated data and should not be in the same table (usually). So you "should" have a table of Pessoa or even Entidade with the general data. And related tables of PessoaFisica and PessoaJuridica with the specific data of each type of person. Actually maybe these two tables can be the basis and not have a table Pessoa that unifies these two types, after all their use is mutually exclusive. It is impossible for the same entity to have additional data in both. Perhaps it was the case to think about Pessoa As an abstract entity that has no physical table, I like it better that way. It doesn’t mean that a table can’t be used, but you have to think about it.

I’ve talked about it in another answer. Note that there I do not speak of relational databases. So speaking in abstract class there makes sense, here no, I’m just making an analogy. The table would have to be concrete and have a relation of composition and not of inheritance, which may not be ideal. And contrary to popular belief, no relational database mainstream possesses functional native heritage. The fact of one database having started an implementation of this resource 20 years ago and then abandoned does not make it able to handle inheritance appropriately.

Specialized data

So the biggest problem of the demonstrated example is the lack of normalization of the type of person. But there is something strange when you have repeated data on Pessoa and in Fornecedor. Even some who are only in Fornecedor They look weird because it’s data that’s not unique to a supplier. Related tables that store information specific to the type of business relationship should only have data that exists only in the relationship. If the data is useful to other business relationships, then they should be in a "higher table".

when the supplier is added to the equation the amount of specialized fields in the "Person" table increases greatly

There seems to be a wrong understanding. Adding vendor should not put specialized fields in the person table. But on the other hand there are no specialized fields in supplier, according to the example. I even understand that the model is not yet ready but in what has been demonstrated there is no need for a secondary table for customer and supplier. Of course you will have specific data, I just want to make it clear that the fields shown are not specialised, at least they should not be. It may be that someone said it is, but that’s another problem. One thing is the requirement that the user says, another thing is the real requirement that the user does not understand. With the advances of database technology I think until this can be done differently but needs care.

Professional

A minor doubt: I’m not quite sure what this is Profissional. Is he really someone who has a business relationship with the organization? Surely there can never be a change, even by law, that he can be a legal person? I ask because it is looking like a specialized type of supplier. It may not be. If it is I think it needs a proper treatment.

Cardinality

I do not know if I understood the drawing of this diagram but probably the relation between the tables should be 1:1 with each secondary table. It is not possible for a person to have two different vendor data, for example. I can even see a scenario where this is possible but it probably wouldn’t be the right way to implement it and it wouldn’t be easy to maintain it. Since need is rare, I think it’s best not to think about it.

Completion

What brings problem in the future is not understand correctly the model, and this is common, but has nothing to do, have to try to think of everything, ask for those who have more experience, try.

If in doubt, having everything together is always harder to solve after keeping everything separate. But what makes it even harder to solve is when information is missing. Imagine the bank having thousands of registered entities and to fix a problem you have to review all the registrations manually.

Does having more relationships complicate database manipulation? Yes, of course, but only in very simple systems that probably will not change is that simplification should be chosen.

If you decide not to follow any of this and use a "linguing" table, will you have many problems in the future? Probably not, changes are never easy, there are ways to facilitate, but they are always unpredictable in their entirety.

What causes the most problem is data redundancy. If one day you have to reconcile these data, it can be a huge storm and almost impossible. And in the diagram shown, this could be happening.

Separating the roles of people allows you to have in the future more easily besides customers, suppliers, professionals, also sellers, governments, banks, carriers, etc. And you will not worry if fields are being created to be left with nothing. The secret is to follow the normalization. Rarely should there be fields without data.

  • excellent approach, I think I could not add anything other than opinions. for example, I believe that here fits a domain table to inform the type of the Person (customer, professional, supplier, etc.) and remove the customer table (for not having specialized data).

  • @Tobymosque I imagine that there is nothing because it is still in the middle of the modeling process, it will put something there. In fact it would be useful to indicate which ones roles that person exercises in the organization. I believe that the ideal would be to have columns indicating this. It can be a single column with flags (I think it’s an exaggerated optimization) or columns - probably boolean - for each paper. Assuming that the primary key in roles really is itself id person. This may complicate a bit (but it is easy to solve) if you have separate tables p/ PF and PJ. But it is most correct.

  • As for using columns with flags or a specialization column, I believe this will depend on the number of specialization tables and the possibility of inserting new specializations over time. If it’s only 3 specialties, flags are really better. Another point is that maybe the address is contacts can be a separate table with 1:N cardinality

  • I think that flags in databases are rarely interesting, at least in "normal" registers, but I do not know if it is not the opposite, let more fields, more the column of flag It is interesting, it is only helps to save space, by the menso in this case, what is unnecessary. I thought about talking about the normalization of the address but I do not know if it has this need and then could open for other subjects that is not the focus of the question. Even the thing can be more complicated because the address does not necessarily need to be linked to a person. You’ve seen how the subject goes?

  • First of all, thank you for your interest in the matter. @bigown Really like the model is under construction there are some things there that I need to clarify. The tool I am designing is to work with medical clinics, in this case the nomenclature of the table "Client" became inadequate (I imagine that the ideal would be "Patient" same). So patients will always be physical people according to my research. Regarding the data repeated in "Person" and "Vendor" was my distraction. I had just put "Supplier" linked to "Person" and forgot to delete these fields from the model.

  • Coming home I will review your question again with my model side by side and clarify any other issue.

  • 1

    Do this. Just by giving a hint, it’s probably not the case: Do you know that customers and patients are different things? I’m not saying you need to make this differentiation, but in more complex clinical systems it should be treated differently. In most cases both may be the same but not necessarily. The client can be a relative, can be the insurer, which would probably be another entity also where there is business relationship. And can have insurance agents also that can be different from the insurer, and there is opening possibilities.

Show 2 more comments

8

When it comes to the issue of good practice you are on the right track, the key point of the question is:

Is this kind of normalization a bad practice? Could you bring me problems further down that maybe I’m not noticing at the moment?

"Troubleshooting" is a very relative issue, so when it comes to programming the approach is correct, you are applying a concept of inheritance in which usually all programming languages offer features such as ORM that provide support for this type of approach.

In terms of design, there is the issue of the problem, considering that you performed this sketch/project based or in a case study of some software or according to a survey of requirements of some client, these requirements may change and make your modeling incorrect or insufficient.

Let’s consider a brief impact study here, first of all regarding your modeling:

  • Perks: better detailing of the various levels of information for example: person -> legal/natural person -> etc. Some SGBD's as PostgreSQL provide native heritage support. The most modern programming languages have tools that aid in the relational object mapping of these heritages.
  • Disadvantages: Maintenance requires greater attention from the team, as the information is dispersed on several levels, so greater attention is needed for cases where it is necessary to add/change/remove information from them. The use of the inheritance resource provided by a SGBD may cause some problems if it is necessary to migrate to another SGBD due to the particularities of each of the platforms.

Considering now the modeling scenario where you don’t use the concept of inheritance:

  • Perks: information is presented independently, thus favouring decision-making in the event of a change. In systems where requirements are very changeable and change dramatically, due to the isolation of relational information this approach may be the one that results in the least impact.
  • Disadvantages: depending on the case there may be an excessive amount of duplicated information over several entities.

To summarize then, the decision of the modeling depends on the maturity of the professional in observing the work scenario, that is, there is no absolute truth, because sometimes good practice methodologies end up complicating some tasks when for example not even the client knows what he wants or changes his mind overnight.

Remembering that it is not always possible to build the best house in the world on top of the dump, IE, have cases that we have to do the best possible within the possibilities.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.