Database normalization in HR system

Asked

Viewed 779 times

3

I am developing an HR application, in it I will have many forms in HTML that may or may not be very extensive, the employee table for example has 45 fields, some fields like company, function and workplace are Fks for other tables, however there are still 42 fields that can be saved in this same table, some may receive null values.

How can I standardize some fields in this table?

One observation is that both employees, companies and local_service have common address fields, in which case I could standardize everything in an address table?

Another idea would be the employee documents, there are 12 more fields, it would be ideal I standardize these documents in a specific table for this?

Table of employees is like this, it has 45 fields, notary Modelo

Modified table of the original question:

Modelo "normalizado"

  • 5

    And how do we know without knowing your problem deeply? One of the mistakes that people make when trying to model something is to think that there are magic rules and recipes that you make and you get it right. What’s right depends on the context. You could do all this and get good or bad, and you might have to do other things, but if we answer, we’ll be solving our problem or an alleged problem, not your problem. It depends better. There are things that within normality seem wrong and you didn’t even touch them, linked to normalization or not, some would question.

  • I understand, is that I used modeling only to express my idea, the database has not even been implemented yet and this model does not represent the final project. I may have mixed things up by putting the modeling that expresses my idea and not the problem itself, I will edit the question for better understanding.

  • 1

    At the conceptual modeling level you should not worry about the amount of attributes. When transposing into the physical model you can take into account whether such attributes can be separated into a set of the most commonly used and another set of the rarely used and then physically create two tables with 1:1 relationship.

1 answer

6


I will reply in a general way, but take note of the comment I made above.

I was writing and the question changed (the version I replied to), then the answer may seem meaningless. The current model of the question seems to make much more sense and has far fewer problems.

I saw that the new one made things worse. The address number is now int, this is bad. But the sizes seem better thought out, or almost, on the other hand it is clear that some things have not been or should be normalized. This is clear in the column estado with 40 characters. It’s still something without thinking because the size used remains a standardized number almost random, but less weird. Every state, or UF (the unit of the federation as it is usually the most correct name because it has a unit that is not state) should have only 2 characters and suffice this or else this would be a code for the description, so it is more normalized. Not that it’s completely absurd to do so, but in general it’s not appropriate.

There are other indications that some sizes are not very suitable and there will record what should not, but again, I am speculating without knowing what is the real problem.

Nomenclature

Why use tab_ in front of table names? It’s obvious information. Why abbreviate some columns? Some I have no idea what it is. Why put part of the columns name with something obvious? I am not going to enter into the debate on what is obvious or not, for example if birth is not always a date and so that information is obvious. Why is there inconsistency in the names? Nor will I get into the merit that people work in entities that are not companies. Nor will I complain that, if I understand correctly, the column logradouro store the type of the patio and not the patio itself.

There is a rule, which need not always be followed, that the names of fields that are always the same always have the same name, there is a controversy whether a id should have the table name prefixing or not (I don’t like prefixing in the original table because it generates redundant information), but should prefix where it is foreign key. Adopt a pattern and follow.

Typing

Workload should be one char?

Why did you use double for monetary values?

The sizes of some fields seem almost random, it seems that found it cute to do so, modeling follows a pattern that makes sense, it is not to be cute.

There are things you can question, you can’t say you’re wrong, but does an address number have 5 characters?

Structure

Does the order used in columns help readability? The foreign key is no more important than the diagram shows?

Are sure that in some tables will have a id and others will have a cod? Understand what a natural key is before using it. See more.

Not all tables should have a replacement key or even a primary key, but you need to have a good reason for this, and know what the consequences of this are.

Normalization

If you don’t have a clear primary key, you don’t have a id own, should this table exist? It’s not a huge problem to have a ratio 1:1, but almost always is unnecessary and bad.

Why have a separate address if that address can only belong to an employee? If it could belong to more than one employee, if the employee could have more than one address or if the address could belong to other tables, it might make sense. I saw no justification, everything you do has to justify. Why didn’t the company have the same treatment? You need to have the same treatment?

In my answer on the question of normalization I talk about the canonicality of information and that repetition does not matter. Deciding whether addresses should be in a separate table is by understanding what this is. Just because the address is the same doesn’t mean it’s the same information. Just as there are two José da Silva that are the same person. Or if two twin brothers univitellinos are the same person. Repetition is not the reason to normalize, be canonical is.

Or normalize or not. This dependent table suffers from the same problem (too normalized) and can at the same time be normalized less. Do you always have 5 dependents? No more and no less? If this is so, why isn’t it normalized? If it is not to normalize then at least put it in the same employee table. Today I even question whether this type of data should be normalized. It has a normalize cost like this and has a non normalize cost, it is not easy to decide. But if not normalize should use some SGDB mechanism or trick to simulate a normalization within the table itself, using a column array, JSON, or a varchar with a format allowing a variable number of inputs, thus simulating a array. This can give a simplification and performance in the model, but you will have to use a specific resource or own technique to access this data (it is usually better if you know how to do), you have to weigh. Usually when you are making only one composition can be interesting to internalize in the table itself. Normalization works best with association or aggregation.

To make the columns worse they can be null, all of them. One of the great reasons to normalize is not to have null, And then you got the worst of both worlds.

Reinforcement for rethink whether you can make the other columns non-zero. In some cases you should do the normalization or use the trick of the composition mentioned before. Just do not take as a rule that can not accept null, the name of the mother and father really seems to make sense to allow null, because otherwise it will have to have 3 tables, employees who have both parents known, those who have only mother, and those who have only known father, and would have to make a union throughout select, looks awful, bad to model, and as most RDBMS have no inheritance, bad to maintain, the magic rule that some follow can bring a lot of harm. And it’s worse, so I don’t like business logic in DB, usually the rule only accepts mom and dad, but you have to have one of them, so to be sure you’d have to have one constraint ensuring that it is a OR, and for that you may have to create a stored Procedure or a function to use in constraint, complicated, mainly because in some cases it will depend on a context that you do not have within your DB. Could have used the example of marital status influencing the name of the spouse who has conditional rule.

Should I not have normalization of schooling, naturalness, etc.? Usually yes, it is basic violation.

Are you sure documents should be separate? Have you read about the problem of using natural key and convinced that CPF is terrible as primary key?

Can’t the person have an individualized salary? Just the salary of the function? I’m not even going to get into the point that in real cases this model needs to be absurdly more complicated than this, this is a pretty naive way of seeing how a personal department works and what it has to deal with, I’m just saying that the real situation should define what the model should look like, if you try to guess you’ll miss badly, but if you’re creating a hypothetical case, this case is only in your head and only you can solve it, you can’t ask other people to help you with it.

There are relationships there that I do not know if they are right, but I can not say without knowing the specific problem, it may be that this organization operates in an unusual way. I don’t know the table’s motivation trabalho and so I don’t know whether it should exist or not and whether it should be in this form.

Completion

That was looking over, must have other problems.

Number of columns in the table indicates nothing if the modeling is right or wrong. People cling to details that don’t matter and don’t look at what matters because in general people don’t understand the function of modeling something.

Data modeling is probably the most important feature a developer should have, and most, even experienced, are pretty bad at it. With that in mind, start preparing. It needs to have an enormous capacity for interpreting text, and a text that doesn’t even exist, and mathematical understanding, especially sets, knows what it saw from 3a. grade (in my time) and that almost nobody cares and that many teachers do not even understand why they are teaching it? If these things are missing, start all over again, with nothing standing.

  • Excellent your very complete reply, I will be consulting your answer other times to go applying in the model, I understand that some things do not have as much need to be followed to the letter, as the case of the state, for being a small application I chose to receive the same full name, but other issues like using decimal for values, the explanation about natural and primary keys I found very useful when I was studying the discipline I had doubts about which to use, nomenclature is because I usually use views and therefore separate views and tab by name, it was a custom acquired.

  • In case you suggested to use an array or JSON structure for dependents, in case my application an employee can have none or up to 5 beneficiaries and are stored only names, so the fields are null, i have no idea how to work with an array or JSON directly in the field of a table, I know that when I send an array to the table it is stored as an array, but the information is not visible, it is written only 'array' in the table field, finally this subject I will have to do some research and study to try to do this way

  • In the case of salary it is linked to function, there are some other values that will be in other tables and the salary will be calculated using these values and I need to have these values separated by function. Finally, there are other subjects that I did not address in the modeling, this model is incomplete. I will follow your ideas I really liked your explanation, but it is not a subject that we catch suddenly, so I will always be rereading your answer, I will not make more comments not to extend too much comments, I do not know if it is correct to respond with several comments like this.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.