Table and column nomenclature

Asked

Viewed 25,273 times

13

For at least four years I’ve been working with the following table and column nomenclature which is exactly like this: The column name will always be tableName + column name = name_table_name_column, below, I exemplified:

  • table usuario
    • columns
    • usuario_id,usuario_nome,usuario_ativo,...
  • table cliente
    • columns
    • cliente_id,cliente_nome,cliente_ativo,...
  • table cliente_endereco
    • columns
    • cliente_endereco_id,cliente_endereco_cliente_id,cliente_endereco_cep,cliente_endereco_logradouro,...
  • table pedido
    • columns
    • pedido_id,pedido_empresa_id,pedido_cliente_id,...
  • table motivo_cancelamento
    • columns
    • motivo_cancelamento_id,motivo_cancelamento_nome,motivo_cancelamento_ativo,...
  • table pedido_motivo_cancelamento
    • columns
    • pedido_motivo_cancelamento_pedido_id,pedido_motivo_cancelamento_motivo_cancelamento_id,pedido_motivo_cancelamento_ativo

Note: I never mix English with Portuguese, which makes some names really big.

I as a developer and as a network user, I do not brief and I do not like abbreviation, I suffer from it and I have suffered a lot in development where I did not know the real name of the tables, columns and even variables because they are abbreviated and it happens that I do not know to answer to those who ask me if this my pattern is "right" or "not right", therefore, this "pattern" titled as "mine" the nomenclature I have presented above is either may be considered wrong and that will cause other developers to have problems with it? Or is it just wrong and won’t cause other developers to have problems with it? Or maybe a better question: how could I improve it?

3 answers

31


Short and direct response

In short, you can create the standard you want as long as it is well documented and internationalized.

So you can say you’re wrong when:
- Does not have adequate documentation (obviously).
- It has no internationalized standard. (still a controversial issue, unfortunately)




Detailed answer, with examples, case study, etc, follows reading below:

There’s not a rule that says whether it’s right or wrong.

Each decides a pattern they want. Obviously, if the pattern is well documented, there is no problem.

Redundancy

Particularly, the example you posted in the question I do not like because it is redundant to have the table name in the columns within the table. But it’s a personal taste of mine. Of course, if I take a project with a pattern of this kind, I won’t be crying.. rsrs As a professional, I tolerate, respect and do my duty. But if I take a ZERO project, I prefer to use a standard without redundant nomenclatures.

Example of redundancy:

tabela1
   tabela1_coluna1
   tabela1_coluna2
   tabela1_coluna3

It’s redundant and makes the name too long.

This is enough here:

tabela1
   coluna1
   coluna2
   coluna3

At least that’s the pattern I normally use.

When I need to reference one table with another, there yes, I describe the table name

tabela1
   id
   foo
   bar

tabela2
   id
   tabela1_id
   outra_coluna
   outra_coluna2

In this example in "table2", the column "table1_id" references the column "id" of the table "table1".

The thing complicates when a table has a slightly longer name.
Example "table_com_name_major"

If there is a column with an extended name, "coluna_com_name_large", the result would be "table_com_name_maior_coluna_com_name_big". Anyway, it’s not wrong, but it’s awful. And you can still have problems with systems that don’t allow very long names.

To avoid this I always try to name with as short names as possible.

Pattern with initial letters

In Brazil has a "standard" that consists of using only the initial letters.
Note that "default" is in quotes, ok?

Example of how I would do for a table of clients and addresses of these clients:

clientes
   id
   nome
   blablabla

clientes_enderecos
   id
   clientes_id
   cep
   bairro
   blabla

Now, translating the same structure to a "pattern" widely used in Brazil, as I mentioned above, the pattern consists of using only the initials.

c
   id
   nome
   blablabla

c_e
   id
   c_id
   cep
   bairro
   blabla

I remember that when I started programming around 1998, in a Delphi course the teacher taught with this pattern using only the initials. The time passed and I did not remember and I never used like this, but recently I met a person who uses this pattern and I remembered this thing from 20 years ago.

I don’t know how it is in Brazil, if it has other patterns, etc, but it seems that this pattern with initial letters still exists probably because it is taught in courses like what I did in 1998.

This pattern is nice because even a table with an extended name is quite short. The problem is that you will always have an extra job of consulting the documentation to only know what the name of a table means if you don’t remember. It’s a small nuisance, but it’s still a nuisance. Sometimes you’re on super-concentrated total gas so you have to stop to read the documentation to understand that merd** means c_o_i_o, for example. If the table has an intuitive name, you don’t need to stop and lose concentration. But anyway, it’s just a small, tolerable example.

The most "flawed" point in this pattern is the collision of names. In a small project hardly collides names but such a situation may be common. Illustrative example: customer_duplicated and customer_deleted would be c_d. It would have to change one or the other. Then "Deleted" can change to "excluded". But if there is already a "c_e" table, you will have to use creativity without deviating from the original meaning "Deleted". A new option could be "archived". It would be "c_a". But, what if "c_a" already exists?

It’s a simple limitation that causes a huge complication. You waste time curled up in something so small it could be solved with another more suitable pattern. At the end of the day it would probably solve in a hurry by putting a number "c_d2". It contradicts the pattern of the project and starts a series of gambiarras. After all, a misdemeanor "here" can be made "there" too.

Internationalization

Before I begin, I will cite a case to understand the importance of an internationalized standard.

Moving halfway around the world in Japan, I’ve dealt with a company where the programmer is an ultra-nationalist. These guys hate anything that isn’t national. In the database pattern, the names of the tables and columns were all in kanjis ideograms. The guy avoided even the hiragana, katakana or transliteration in Romaji. To make matters worse, it used terms that are not easily found in dictionaries. Even, the initial project was to use POSTGRE or Mysql, but both did not support multibyte table and column names. The brilliant solution to maintain the nationalist pose was to opt for Oracle. Forced the company to pay licenses, modify the codes of the programs and still have an extra cost with certified Oracle professional. Nothing against Oracle, but it was an extremely ridiculous and unnecessary situation because it uses not even 5% of the potential of SGDB.

As if that were not enough, it was quite difficult to have to read and translate the 300 tables, each with more than 20 columns. Everything in Kianji, one of the most beautiful.. rsrs. Even my Japanese colleagues at the time had difficulty understanding the names.

Finally, a childishness which cost expensive for the company. Obviously who paid was the customer who ordered the service. And it pays until today. rsrs.

To avoid situations like this, at the very least, you must have an internationalized standard. The language for internationalization is English.
The same goes for a Brazilian or any other nationality. One day your company can hire an American, an Indian, a Russian, a Vietnamese, expand services to another country, anyway. People from anywhere on the planet can handle English with ease. If the application is not internationalized, it will cost more because the programmer will waste time in translating and understanding the structure and still with risks of misunderstanding. Also remembering that obviously this is valid for documentation and not only for codes, table names, function names, etc.

Reserved names

It is recommended to avoid reserved names such as date, data, name, among others that vary according to the DBMS.

However, this has been changing recently. Currently Sgdbs allow to use the name "whatever", but some may still present conflicts with reserved names if not escaped for a literal interpretation.

Example with Mysql

tabela
   id
   date
   bla_bla_bla

Problems may occur in a SELECT, for example

SELECT id, date FROM tabela

To avoid problems, just escape the names of columns and tables with backtick

SELECT `id`, `date` FROM `tabela`

For more details: https://stackoverflow.com/questions/2889871/how-do-i-escape-reserved-words-used-as-column-names-mysql-create-table

You can also choose to use a different nomenclature:

SELECT id, date_created FROM tabela

Finally, each decides on the pattern of their own project.

Naming convention (naming Convention)

Example, for "Client Addresses" we would normally create a table like "customer_address".

Best known patterns:
- Camelcase.
Example: CustomerAddress or customerAddress


- Underscore.
Words are connected with an underscore/underscore. Example: customer_address


- Hyphenate.
Words are connected with a hyphen. Example: customer-address


- Uppercase / Lowercase.
A pattern where all letters are written in uppercase or lowercase letters.
Example: CUSTOMER_ADDRESS, CUSTOMERADDRESS, CUSTOMER-ADDRESS, customer_address, customeraddress, customer-address




Regarding the name Convention, each one chooses what they like or what is most convenient. Particularly, I always prefer the simplest, portable, etc.

An example of the difficulty you can have in choosing a pattern. The hyphen, for example, you can take a hosting system or host that doesn’t allow you to hyphenate tables and column names. Then you will have a big problem in adapting the codes and, as you will do everything in a hurry, the gambiarra will eat loose. rsrs

The use of uppercase letter may also have some problem, but it is usually easy to solve. An example, with Mysql, we have the configuration lower_case_table_names. It’s very common for an inexperienced person to fall into this problem. In the development environment creates a project where Mysql is with "lower_case_table_names" configured as "case insensitive". And when you install on the production server (online), the hosting is with lower_case_table_names as "sensitive", which starts to generate several bugs in the functioning of the system. If the bug runs in a hidden way, in a process where no errors are triggered, then you may have a serious problem if it involves financial issues and very important data.

For these and others, particularly prefer a simple standard and that works well in linux, windows, mac, online, offline, anyway. The default is all in lowercase letter and composed words separated by underline. That’s enough to avoid a lot of problems.

Let’s make one more complicated?

If you opt for Camelcase, as would be the default for compound names with words that are abbreviations and usually capitalized?

Example: "HTTP Something". It would look like this

HTTPAlgumaCoisa or HttpAlgumaCoisa ?

And if you need "XML HTTP Something"?

XMLHTTPAlgumaCoisa? or XmlHttpAlgumaCoisa ?

And when you come across plural terms?
addresses or address? city or cities? customer or customers?

Finally, they are just basic observations for you to survey the project from the beginning. Thus avoid wasting time in having to modify the documentation with the project in progress because, small breaks like this one go consuming time, dispersing the focus on the business model and, the deadline will shorten.




The structures of the examples are fictitious, with didactic purpose. The focus is not modeling, but rather nomenclature within the context of the question.

  • Very good, particularly prefer without underline _. Generally id, name, addressid, created

  • Excellent answer! I was just looking for this. In some readings I did on the Internet, some topics addressed the issue of BD performance when using very long names, such as the examples you cited in the redundancy case. Do you know if this really exists? Or just "rumor"?

  • The size of the name does not imply performance significantly. Of course any extra byte is a cost of memory, but it’s too small to say it slows down DB. Of course, the smaller the name, the better. Always try to create names as small as possible. There have to be creative and juggling to reconcile the nomenclature with the default application.

  • @Danielomine what you think of my share, below?

  • I do not recommend abbreviations like this, it is very bad maintenance because the documentation has to be consulted simply to know the name of a table, imagine the relationships with various levels as would be this. I suggest objective names and, if possible, short but not abbreviated: Example: "Contract", "Contratoitem". Knock your eye out and you know what it’s about.

  • What a great answer. Totally within the context. No running away to other kinds of arguments. Show!

Show 1 more comment

5

In nomenclatures there is no right or wrong. In thesis you can do as you wish. Anyone who says it’s right or wrong for this is being biased.

Of course there are some recognized parameters that work better. But they can be ignored if you have a good reason for this. You know those databases that number all the names? I’m totally against it, but there might be a reason to use it. Nowadays, no technician, of course.

The important thing is to be readable, something that seems to be aware of the importance.

Whether you use Portuguese or English depends on the project. You have to do what will be useful for the team. This is an important point, if you are told to do it in a way, you can debate, try to show that something is not ideal, but if you do not convince, follow what you are told.

If you can choose, put significant names. Size is not a problem in most cases, but it may be that some tool screws up. Then you can think about whether the tool should be used, it seems to be obsolete or poorly thought out. Oversized size for becoming a bit boring to use, not only to write (this one has solution), but to read too. Legibility is a broad concept, exaggeration for one side is as bad as for another.

There is an "error" (it is not enough to actually be an error, as I said) classic: use the table name in the column name. What’s the point? Redundant information is unnecessary. Some people say that the id and eventually some other column that will then be used as a foreign key in another table may have the table name to match that of the foreign key that will need the table name. Others find this unnecessary, and in fact it doesn’t usually cause problems (unless you use some tool that expects this pattern).

If there is a technical reason to redundant the names, I would not do this. I also prefer the use of pascalCase than use underline, but it’s taste.

Some people prefer to use plural or even collective names for table names. There are controversies about this.

There are even those who prefer a more fluent Portuguese with prepositions, articles, etc.

  • As for the size, I never really worried, the option to do everything in Portuguese is precisely to make it easier for the team, particularly, I find the English more attractive, is more homogeneous with the other lines of code of the project. But as for the redundancy I need to pretend I see nothing, why in some cases, in complex queries or with many inner join the redundancy I cause helps in identifying the columns more quickly and discards the use alias for column names, in addition, also help faster understanding of the scheme.

  • 1

    The lack of redundancy never caused me problem, after all when I want I use the table name. When I’m forced to use the table name, or when it’s totally unnecessary to make it legible, it’s horrible to read that redundant. If you like it like this, do it like this, if you invite me to work with you, I refuse :)

  • Opá, thanks for the comments @bigown. I’m in the lead of the next project and I was just thinking about problems that I would have in implant that way of doing things. I welcome your comment in the form of action, I will ask others from my next team if they accept to work in this way, and I will be more relieved to ask why, at least I have seen that this form may not be wrong and may cause problems, but perhaps a bad one ;)

2

https://github.com/incompletude/bancodedados

I wrote a documentation that aims to ensure good experience by establishing consistency rules in databases that meet Restful architecture, including micro services. There is a complete workflow with consistent rules for modern development, where the goal is to ensure the best user experience for developers of all layers. I suppose these premises also serve other architectures.

The thesis is that the choice of object names in the database is the most important process of the project, as the practice of repeating these names in all layers of development is not only encouraged, but is a requirement.

Suppose for example:

SELECT
    id_usuario,
    nome,
    senha,
    usuario,
    ativado,
    criado
FROM usuarios
WHERE
   usuario = 'Dev1' AND
   senha = 'Password1'

Whether it’s backend language objects, Javascript objects, HTML elements, the JSON response of the API, or even the URL, they all follow the same pattern:

GET /api/usuarios/1

<input type="text" id="nome" value="a">

for usuario in usuarios
   usuario.nome
   usuario.ativado

{ "usuarios": [{ "nome": "a", "ativado": true }, { "nome": "b", "ativado": false }]}

The reasons are:

  1. Makes it easy for the developer. No need to learn 2 dozen complicated acronyms.

  2. Makes it easy for the developer. The time spent in the process of choosing names is recovered throughout development by getting rid of the task of inventing coherent names for columns like fl_usr_atv, for some people posicao, to others status, and to others flag.

  3. Makes it easy for the developer. If the names are well defined, intuitively they are used in the course of the project, generating code more consistent and simple to understand.

  4. Facilitates for the developer. Developers hate to choose names, so how about doing it right and once only?

  5. Ides are extremely powerful. The default int_number was invented for languages and Ides from 3-4 decades ago, today to know the type of a variable just hover your mouse over it. This pattern is used paenas for special cases such as id_usuario.

  6. Transpilers are extremely powerful. Languages spit out any object or enumerable without any effort, an object javascript flipped JSON and c# at the next moment in a transparent, automatic manner. Why clutter up the middle of the field with a database full of complicated and weird names and patterns?

  7. These rules overturn habits created by old documentations and in English but that do not make sense in modern projects in Portuguese. For example, in English collective names are preferable to plurals for table names, staff is used in place of employees because staff clarifies better the table content and generates more readable code, for employee in staff. However Portuguese has a very limited number of collective names, after the tables alcateia, colmeia and manada there are not many others left. In the singular, a table called usuario invites the developer to implement the repository usuario, from there to for linha in usuario or for i in usuario is a leap. Terrible. Acceptable implementation is for usuario in repositorioUsuario, and the good one is for usuario in usuarios.

  8. These rules are used in modern English applications, but work even better in Portuguese because there is no collision of SQL reserved terms, order flipped ordem, for example.

  9. Creates better products. Compare the two Apis below, they have the same goal.

https://cryptowat.ch/docs/api

https://www.cryptocompare.com/api/#-api-data-pricehistorical-

Browser other questions tagged

You are not signed in. Login or sign up in order to post.