Type of the CPF or CNPJ field in the VARCHAR or INT database?

Asked

Viewed 66,144 times

65

What type of CPF or CNPJ field in the database VARCHAR or INT?

I’ve seen some posts suggesting using INT to optimize performance in the event of JOINand filters.

If you have 0 the left is only complete according to the size of the stored field (CPF or CNPJ) to perform the validation.

So what would be the best approach?

  • I think the question can be answered objectively, so much so that I did. The fact of having another answer that only issues opinion is not the fault of the question.

  • 2

    I saw no value in the question at first, but the fact that there is a consideration that the answer is a matter of opinion made me realize that the question is useful yes (although the answer is obvious, it seems that it is not so obvious to everyone).

  • 1

    I’ve always used VARCHAR, but despite not believing anything that is on the internet, already some comments advocating to use INT, although it does not present any foundation. And separating into: body ,filial, and control to use so: intCPFCNPJ = 000000000 and sintCPFCNPJFil = 0 and tintCPFCNPJCtrl = 90. I found a gambiarra terrible, but I found it interesting to see other opinions, although the answer seems obvious.

  • Better use VARCHAR.

6 answers

106


Optimization should only be done if it is really very necessary. I doubt that this optimization is worth in any scenario.

The basic rule for selecting the type when they can be confused is what you will use them for. Numbers represent quantities. CPF or CNPJ are quantities? No, they are identifiers that can even change, that can one day have letters (happened with RG). The correct semantics for this data is the varchar after all he is descriptive.

Controle deslizante para entrar com o número de telefone de 10 dígitos

The question of zero on the left alone is already good reason to use varchar. So you can skulk around to solve a problem caused by the wrong guy. I always say this for everything in computing and even in life, the person causes a problem and then causes new ones trying to solve something that they should have done right from the start (This happens more than almost everyone realizes, especially with experienced people who can no longer see their mistakes as the arrogance that develops, even speaking the opposite)

You should always choose to varchar on any given data until you find a reason to choose another type. A general ID, for example, has a good reason to be a int, It needs to be incremented, you do accounts with it. The same goes for salary, price, quantity. But it is not valid for zip code, phone, even a property number in the address when this field is separated. An expiration has reason to be of another type, you need to sort the information by date order, and as text this would not work correctly. These are just a few examples.

Some will even say that it is necessary to use the varchar because he has scores beyond numbers. But this is not a good argument since the punctuation should not even be recorded, at least not in most situations. This should be solved in the presentation only, the score is not part of the data (usually).

In some cases a numeric type cannot even be used since the number of digits used to identify something is greater than the numeric type holds.

Some will say that neither varchar should be used. As the size is fixed, a simple char solves the problem. But there are controversies about this. Depending on the DB system one type may actually be more advantageous than another but not enough to justify, in most cases a choice will be made because of the performance. Semantics should be the primary option.

Unfortunately there is a lot of wrong information on the internet about the subject (or any subject :) ).

as seen on the internet

  • Great answer! I’m just with a final question, you advise creating a Cpf column and another cnpj or create just a column to store both?

  • 4

    @Marciomazzucato relies heavily on the case. Today I always make different tables for legal and physical person, nor need to say that the columns are separate. I model records as they really are, not create an artificial division and a more artificial union yet. When you like something you can vote for the post. http://answall.com/q/54177/101 and http://answall.com/q/14839/101

  • @Maniero Great answer, took away any doubt that this subject generated me

  • Semantically varchar means "variable-sized character field". Since CPF has no variable size, char is most suitable semantically. In addition to the aforementioned performance advantage.

  • 1

    @Maniero, boy, where did you learn all this? what a formidable response!!!

  • 2

    @Party in 36 years of dedication and structured study. Thank you.

Show 1 more comment

28

I’ve always advocated using VARCHAR, because they are a string that coincidentally occurs to be a sequence of digits. Unlike a INT where the purpose of being a number is to allow you to perform mathematical operations, something that makes no sense at all in a CPF or CNPJ.

The same case occurs with RG, where each state has a different format, and use as INT would remove the zeros on the left in addition to bring problems concerning the check digit.

  • You advise the creation of a Cpf column and another cnpj or create just one column to store both?

  • Marcio Mazzucato, I have used the same column to store CPF/CNPJ, however using a "Type" auxiliary field (e.g.: CPF=1, CPNJ=2) to indicate what is in the main column (to facilitate some queries). How I save the data with the left zeros (CPF size 11, CNPJ size 14) do not need to use Constraint with the 2 fields to ensure uniqueness.

10

Use VARCHAR you will have problems if the user uses . (point) or/and - to separate the digits. If you do some checking and remove these or other characters, then you can use INT. But I recommend using VARCHAR. Use INT for records you know will only have digits/numbers.

  • only a weighting... you can only know that the field will always have digits/numbers if it is in your domain, and if its nature really is intrinsic to numbers, it cannot be something controlled by external agent as is the case of the IRS that controls the definition of Cpf/cnpj

2

If we look at the size of the INT(UNSIGNED) ranging from 0 to 4294967295, we have 10 supported positions.
As the CPF has 11 numbers without formatting, we would have 1 number more in this string, forcing the DB, at the time of insertion, to make a CPF with initial value say of "500.000.000-60" or if it is "4294967296" turn 4294967295.

The rule applies to the CNPJ as well, in the case of the CNPJ with 14 numbers without formatting. The ideal would be to use a BIGINT from 0 to 18446744073709551615. For speed issues in queries, an index can be added in this field and other rules ranging from each DBA.

In the case of the 0 left in Mysql for example we can define the field as ZEROFILL.

On formatting it is not necessary to allow the application to reach the DB with a CPF or CNPJ formatted with "." , "-" or "/", this rule should be made in a view.

  • 2

    Possible to use whole is, but I find it difficult to say that "is ideal". And another, if it is to store in "improviso", could use an INT without problems, after all a valid CPF only has 9 significant digits. The conference two are calculated (unless you want to store even if the conference digits don’t match). The important fact is to keep in mind the nature of the data, which is already well elaborated in other posts.

2

According to Victor Stafusa’s answer, it will help you make the best decision to see what types of operations will be carried out in the field in question. In the case of CPF and CNPJ, these do not undergo arithmetic operations (usually) and are treated as alphanumeric data types (char, varchar) since their validation algorithms, which by definition consider them fixed-sized character strings. Another point in favor of using alphanumeric types such as varchar is that it is easier to use searches using the LIKE operator, as opposed to fields of numerical types. Data fields of "identifier codes" (such as CPF and CNPJ) alphanumerics can also be considered more the proof of the future, because there is some possibility (maybe remote, but there is) of switching to using letters in these, since each letter greatly increases the number of different combinations and at the same time requires fewer characters to memorize.

But like everything else in the world of computing, everything has its price and depending on its performance requirements and database size, it can be much more performative to store Cpfs and Cnpjs as numerical types, since they take considerably less bytes (especially in relation to varchar with Unicode), resulting in physically smaller table indices, in addition to faster search algorithms. In fact, I have seen great performance improvement in Oracle databases when having Cpfs stored as number.

-1

Cara I advise you to use a string even, because you may have problems with 0 the left pq they can be disregarded.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.