In a general way you nay shall apply object orientation rules to a relational database.
CPF
But to tell the truth I found the example of the CPF a little strange, unless it was exactly to show a wrong use.
Why do you need new Cpf()
in class? I can’t imagine why. If you have one it might make some sense to replicate a differential treatment in the database, but I doubt it would be a normalization.
Normalization
CPF has been normalized. Normalization only makes sense to avoid repetition, duplicity, redundancy. Where is a CPF redundant? I see it as canonical information that needs no normalization. Of course if it is placed in the wrong place it would need normalization, but the error is already prior.
A common mistake is to consider that the CPF is an immutable information of a person. In general it is, but who ensures that this is an absolute truth always. So I’ve seen systems using CPF as a primary key to identify people. Then someone changes CPF and the system is in a snooker nozzle.
It is very complicated to use data that has no control for primary key, so we prefer a replacement key.
At least this mistake was not made in this case. The CPF is information that only composes the person’s registration. Another common mistake is to find a class Pessoa
represents a person, when in fact it only represents the registration of a person.
Address
In the case of the address the object already makes a little more sense to have another entity representing it since it is probably composed of several fields. It’s interesting to have a membership or other form of association.
In memory it is interesting to have pointers to other data, in database this is not always interesting for performance reasons. This is why object-oriented databases are not very successful.
This should be reproduced in the relational database?
It depends. This Endereco
is only associated with Pessoa
or does he compose the person? That is, the Pessoa
owns the Endereco
?
If the Endereco
is a completely separate entity (we misunderstand what an address really is) - after all several people (physical or legal) can reside at the same address - so it makes sense to have a separate table.
If the Endereco
really belongs to the Pessoa
, probably doesn’t make sense to have a separate table, so you would merge the class Endereco
within the Pessoa
in what I would call class flattening or smoothing (flatenning). This creates a impedance between relational and object models.
What if the Pessoa
can you have multiple addresses of your property? Normalization can start to be interesting since you have an indeterminate number of addresses for each Pessoa
. But note that the motivator is another. It may be that the person has a billing address, a billing address, a delivery address, a maintenance address and other "creative".
If these various addresses are temporal, ie need to keep the various addresses that the person has had over time, probably for tax purposes of invoices, an external table is likely to be necessary because the address needs to exist independently of the person, that is, some other table will reference the specific address. But it is possible to avoid using an extra table, I just don’t know if it is a good idea.
Mechanisms of the databases
On the other hand today’s database systems can work reasonably well with multiple data with indeterminate quantities. Postgresql has a type Array
. Others have a type XML
, JSON
or BSON
that make it easier to have data like this. Even when none of this is available it is still possible simulate on top of a VARCHAR
or BLOB
.
I tend to use this kind of thing because conceptually in the relational model an object that is totally dependent on another, has no life of its own, cannot be accessed independently it should not have its own table. Often we normalize the table to separate these objects for the sake of mechanism and not the necessity of the business rule.
I can’t say that physically separating in this case is wrong, some people might disagree with me. I think it’s abusing the relational model by creating a bottleneck - however small - unnecessary.
Of course, this works best for an almost certain number of elements. In other words, if you know that almost every person will have between zero and five or ten addresses, that’s one thing, but if you can have tens, hundreds or thousands of addresses, that complicates it. And this is not so absurd if the addresses are temporal.
Organizing the logical entity
To be clear, address is a separate logical entity, there is no doubt, we are analyzing whether it should be physically separated.
Even when we think about organizing the data into a different entity we can still do this and keep it on the same table, at least in the most powerful database systems. They allow you to create a domain. Then you create a type Endereco
which will be used in the table as column type, within this type there are other columns.
From the physical point of view this does not change anything in the table, but from the logical point of view, you will access the Endereco
as a single entity and will have members within it as TipoLogradouro
, Logradouro
, Numero
, Complemento
, Bairro
, Cidade
, Estado
and Cep
. Although some of this data should be standardised but this is another matter.
So you create a separate entity that will physically be allocated to the same table. This bears similarities to the type per value. Because it seems to me that an address is immutable. Note that the Cpf
is inherently immutable (read the link to make sure you understand what is immutable).
I think this line of thinking helps to define whether there should be another table or not, but it shouldn’t be the only one. The cardinality, the redundancy generated and the type of relationship also influences.
Completion
Then we get into a bigger discussion even if the OOP model is right. But I won’t even go into it, because PHP is not the ideal language to want to do everything right.
Every model in its own way.
In this case the CPF does not deserve separate table. But it also does not deserve separate class1.
The address deserves a separate class and probably deserves a separate table, since they may eventually exist independently of the person - the person does not own, mainly exclusive, the address. But not because it may have multiple addresses associated with the person.
Just giving one last reminder: there are situations that denormalize.
Just like being always conceptually correct is also silly. There is a time to do wrong. As long as you know what you’re doing, understand all the implications, have a good reason, all counts.
May be useful: What is the difference between Association, Aggregation and Composition in OOP?.
1Maybe you deserve a class value Object to validate but not because you need something specific associated with it. You need a class in the same sense that you need a class string to represent texts (in PHP this statement is a little weird since string is not a class - but works with a value Object -, but I think I got the idea)
"Another common mistake is to think that a Person class represents a person, when in fact it only represents a person’s record." How to represent a person then?
– Thalles Rangel
It does not represent, it is not possible computationally, a person is too complex for this, why I always say that to program law you have to learn ontology, taxonomy and even dialectics.
– Maniero