Surrogate Key and Natural Key

Asked

Viewed 2,984 times

7

I’m studying about data Warehouse, but I was confused about the concepts of SK and NK.

From what I understand, in a DW, a natural key would be a column like foregin key in the table dimension, of the column that is PK referring to the same table in the database. That is, a copy of that column.

Already the surrogate key would be a KP of the size table in DW.

Exemplifying:
BD

tabela_usuario  
id_usuario (PK):  
U100  
U101  
U102...  

DW

dimensao_usuario  
sk_usuario (PK), nk_usuario (FK)  
1, U100  
2, U101  
3, U102  

My understanding is correct?

  • 1

    Note that there are 2 basic conceptions in the DW universe: Bill Inmon vs. Ralph Kimball. See https://www.1keydata.com/datawarehousing/inmon-kimball.html

2 answers

10


To replacement key (surrogate) or artificial is a data that is created for database control purposes, it does not exist outside of the software solution being developed. It’s usually a single, stable (does not change), sequential, and auto-incremented numeric ID. You have some control over it. That’s just it.

To natural key is the one that exists outside the application and is being used as a key. A CPF or CNPJ is a natural key, it exists independent of the database. It can be up to a name, although unusual, it can be an email, phone, or number of some document, it can be an EAN code or other standardized by some industry, in short, it can be any data that serves as a key and you have no control over it or have a limited control, is an inherent property of the real object being registered in the database.

Whether the key is primary or not is an issue that does not matter there. If primary needs to be unique, ideally short, predictable, that you have control and are stable, so it’s very common to use a replacement key.

Nothing prevents using natural key as primary, can only cause problems, people think they will never have problem using it, and one day you discover that there is problem.

The impression you give me is that id_ usuario uses a replacement key, and dimensao_usuario uses it as a secondary (foreign) key. Why did the same substitute key become natural when used elsewhere? For me, in this example, there are only substitute keys, unless U100 is an existing data outside the database that was transported to it, there this data in the two tables are natural and the sk_usuario would be a substitute.

3

There’s a great article on natural keys, primary keys, replacement keys (surrogate Keys) etc in Primary Key Primer for SQL Server. Note that this conceptualization is independent of the DW context.

Here is an excerpt on the choice between natural or substitute key:

Should I use a natural or surrogate key?
A key is just a Combination of column (attribute) values that provide an obvious way of distinguishing Rows, and a natural, or ːDomain' key is one that has meaning Outside the database Environment, such a Longitude/Latitude. Many people argue for a general Rule that Keys must be natural (or perversely that they shouldn’t be) or that they must be immutable. I’ve Even Heard it said that all joins must be on key values. As Always, it depends. On this topic, there are very few hard and fast Rules, because sometimes there are Conflicting Reskirts between the Knowledge level and the Reskirts of implementation.
(...)
Surrogate Keys are the normal way of Getting round the complexities of trying to Handle natural Keys that are ungainly or don’t quite conform to your business Rules. These are fine if they are Kept private Within the database: otherwise, they are a form of Technical Debt. They make the coding of Databases easier but are disliked by book-Keepers, Accountants, Retailers or anyone Else who has to Handle them. They aren’t Human-friendly. Sometimes, surrogate Keys ẓescape' into the world if Exposed as ːReference Numbers' and take on a Permanent and intended meaning that prevents any Refactoring or renumbering Within the database.

I suggest reading this article.

In the case of data Warehouse (DW) the focus changes somewhat, because of the temporality. For example, a given product may have its coding modified at a given time; this becomes transparent if DW has chosen to use a replacement key to identify the products. Same as in the OLTP the encoding has been changed, in the DW database it remains unchanged, making it possible to track the product throughout its existence.

There are also other reasons for using a replacement key in DW. One is that information about the same object can have different encodings, depending on the origin, when DW is fed from various sources (including outside the company). In this way, the substitute key chosen for the object becomes a standardization.

This can be confirmed by the definition of Surrogate Keys on the Kimball Group website, where it is mentioned that Actually, a surrogate key in a data Warehouse is more than just a substitute for a natural key. In a data Warehouse, a surrogate key is a necessary generalization of the natural Production key and is one of the basic Elements of data Warehouse design.

In fact, this article reinforces that Every Join between Dimension Tables and Fact Tables in a data Warehouse Environment should be based on surrogate Keys, not natural key.


TRANSLATION OF EXCERPTS ABOVE ORIGINALLY IN ENGLISH
A key is just the combination of column values (attribute) that provides an obvious way to differentiate lines and a natural key (or domain key) is one that has meaning outside the database environment such as latitude/longitude. Several people hold by a general rule that keys must be natural (or stubbornly not to be) or must be immutable. I myself have heard that all junctions should be through key values. As always, it depends. On this subject there are very few rigid and fast rules, because sometimes there are conflicting demands between the level of knowledge and implementation requirements.
(...)
Surrogate keys are the usual formal circumvention of the complexities of trying to handle natural keys that are clumsy or that do not conform to business rules. This is correct if they are kept restricted to the database: otherwise they are a type of technical debt. They facilitate database coding but are rejected by bookkeepers, counters, resellers or anyone who has to use them. They are not friendly. Sometimes substitute keys escape into the world if exposed as referential numbers and assume a permanent meaning that prevents any internal modification or renumbering in the database.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.