Database modeling, when it is possible to denormalize

Asked

Viewed 150 times

3

I’m modeling a database (MSSQL) and a question has arisen. In every business rule of the clients the data is visualized vertically ex:

  • Product,
  • Original Price,
  • Concurrent Price 01,
  • Concurrent Price 02,
  • Concurrent Price 03,
  • Concurrent Price 04,
  • Concurrent Price 05
  • Datareading

that is to define the 5 competitors straight away and at worst would have to add a new column for a new competitor.

But my knowledge in SQL says that I should normalize and put the data horizontally, even knowing that the chances of increasing a new competitor is tiny.

ex:

  • Product
  • Original Price
  • Concurrent Price
  • Competitor id;//1= competitor 01, 2=competitor 02, etc
  • Datareading

By doing it in the traditional (standardised) way, I have already realised that I will have a lot of work to generate the reports as the customer wants, as several lines actually represent only one line for the report. In SQL would have to make queries with PIVOT

Doubt:

What is the correct shape in this modeling? Following the business rule of client and then denormalize to facilitate or follow the rigid standards of normalization? This is a scenario that for me would be a denormalization advantage.

  • Competitors are always the same, for all products?

  • No, there are 5 separate competitors, each product is represented by 5 competitors.

  • @Dorathoto Have any answers solved what was in doubt? Do you need something else to be improved? Do you think it is possible to accept it now?

  • I’m still checking.....

3 answers

3

Generally speaking the ideal is normalize. Until I have a reason not to. I I have already responded to this in general terms.

There will never be more than 5 prices of competitors? I believe there will be, right?

Can you afford the price of having a small unused occupied space when there are less than 5 prices? I guess it’s not a problem. I don’t think that’s your concern, and we shouldn’t always worry about that. Even if you have this concern, it has a solution, although it may not simplify reports and other operations.

From the comments, it seems that not only will there always be 5, but they will be the same. It seems to me a clear case that normalization will only be done to follow the rule. It is disregarding the concept.

When going to model you have to think of the concept first. There in the answer I Linkei above says that there are cases that are not even doing denormalization, the model should be like this. I’m not saying this is the case, but the question is always why you’re doing it one way or another. Don’t follow rules, do what’s right for that case.

Do you think you might have problems in the future with a change because you did it in the current way? I don’t think so. This part of the database is used by the whole system and a change would be a nightmare? Maybe it’s something very specific used in something punctual.

Never think about the best model for the report because that’s the easy part and it’s no problem. But never disregard other factors that may indicate the non-existence of the "rigid" rule. I can’t answer with any certainty because there may be a hidden requirement there, but it seems like a case to do as you wish.

2

In my opinion, data modeling should be carried out independently of sgbd. After finishing the data modeling comes the physical implementation, when then it may be necessary to adapt the data model to the characteristics of the sgbd.

(1)
Considering 3FN, we could have the following physical implementation in sgbd SQL Server:

-- código #1
CREATE TABLE Produto (
  ID_produto int primary key,
  Nome_produto varchar(200) not null,
  Preço money not null
);

CREATE TABLE Concorrente (
  ID_concorrente smallint primary key,
  Nome_concorrente varchar(80) not null
);

and

-- código #2
CREATE TABLE ColetaPreço (
  ID_produto int not null references Produto,
  ID_concorrente smallint not null references Concorrente,
  Data_coleta date not null,
  Preço money not null
);

You can create a display (view) that returns the prices of products in a single line per product, making the visualization independent of the physical implementation.

There are advantages and disadvantages to this implementation.


(2)
Sometimes, because of performance, denormalizes some entity for the physical implementation. If you realize the need to implement in 2FN, you can then use 5 columns (one for each competitor). Something like this:

-- código #3 v2
CREATE TABLE ColetaPreço (
  ID_produto int not null references Produto,
  Data_coleta date not null,
  Preço_concorrente1 money null,
  Preço_concorrente2 money null,
  Preço_concorrente3 money null,
  Preço_concorrente4 money null,
  Preço_concorrente5 money null
);

The disadvantage of this option is that, if they decide to add a sixth competitor in the price collection, it is necessary to change the structure of the database as well as the existing programming.


(3)
Another more radical way would be to store the competition price list in a single column:

-- código #4 v2
CREATE TABLE ColetaPreços (
  ID_produto int not null references Produto,
  Data_coleta date not null,
  Lista_preço varchar(200) not null
);

Each of these options should come with the respective set of manipulation objects (display, function etc), to turn the data into something simple to view.

  • So the solution I’m going to use is this, horizontal, I made a simplified representation up there, but in this view makes it difficult to report to the client since it’s a BI system and everything in them is vertical. almost thinking of doing a way that for me it is gambiarra to facilitate the rest.

  • I will study about these terms 3fn, 2fn, thanks

  • on the code 3, I do not see the pq of having Id_competitor 2, price2, we could have a single column Precompetition2, and so look in the competitors table that is Id=2.

  • At first I asked if "Are the competitors always the same, for all products?" because I was in doubt about this part. /// Is the competitor 1 of product 1 (soap, for example), the same competitor 1 of product 2 (cheese bread, for example)? In all, independent of product, there are only 5 competitors?

  • are always the same..

-2


A conclusion I’ve come to:

When denormalizing or not modeling?

By default normalization is always a path that we should follow, in this my project was clear that it seemed one of the rare cases that denormalization would even be worth, by some specific rules of the client.

So my approach was denormalizing. But it had a parameter that had not commented on the question and that I put here to help those who want to venture by denormalized database.

I had a lot of trouble using EntityFramework, definitely it is not done for this!.

There is even a way around various problems he was having, but the solutions were more monumental than the difficulties he would have in the case of standardisation.

Summary: I will normalize the database (will generate difficulties in reports and some other points) but Entityframework + assync processes are not prepared for non-standard databases. It may be that other languages or even with Ado.net in synchronous, work.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.