What is the problem of using [n]varchar(max)?

Asked

Viewed 1,851 times

4

Why the practice of using [n]varchar(max) can cause problems?

In my scripts whenever I don’t really know what the length of a string and to prevent problems, I have used nvarchar(max).

A few days ago my intern supervisor was reviewing my scripts and told me to change some nvarchar(max) for nvarchar (8000), for example. And he quickly quoted that I should avoid declaring everything as max.

What reason should I avoid this practice? Is there any relationship with effectiveness?

2 answers

5


And you didn’t ask him why? Should, if he said he has to substantiate. Unless he didn’t know, that he read it somewhere and went off repeating it like a parrot. Anything without foundation should be ignored, so you’re right to ask.

You should use, at first, the most semantic one. Do you want to put a limit? Putting 8000 is an acceptable limit number? Or another number is better, who knows less? If you want no limit, why would you do this?

How optimization can consider the implementation detail of the tool you are using. Today, and for a long time, SQL uses different shapes when the column VARCHAR It’s up to 8,000 characters or it could be more than that. With this limit the text will be saved as part of the line, and when you can pass it the text is written separately on other pages of DB, and it makes a kind of Join transparent implicit for you, but that has an extra reading cost. It usually has an advantage for cases like this.

Not always keeping large texts together with the normal line can be an optimization, it’s hard to say that, but if the choice of SQL Server is always to separate the text when it can be more than 8000 characters (I can’t say that it always does this) and you know that most texts will be small, so really limiting to 8000 can be a good optimization because it reads everything straight always need to resort to an extra reading of another page in most situations.

For most cases this optimization is silly, but if you have large volumes and certain patterns then you can compensate. You have to measure. But first look at the semantics desired.

Noting that this holds true for SQL Server only, and for the versions up to this date, nothing guarantees that it will always be so, although it is likely, after all has some unofficial documentation about it, so it would have to be very important to change.

It may have some extra implications, as I said is implementation detail.

In general it is good advice, especially now that you know the reason and can make a better decision in each case. It’s about efficiency, effectiveness is something else that’s more complicated to analyze if it is, even more so without context.

  • Thank you @Maniero, I understand your explanation. When my supervised quickly, together he gave a brief explanation, which I did not understand why I work in another language and sometimes it is not clear to me what he speaks. I lack vocabulary in the other language. So I try to understand the idea of what it is saying and then research what it was trying to give me. And yes, the script that I wrote that my supervisor reviewed was going to treat a huge range of data and it was running for more than five hours. Thanks again.

  • And changing it brought measurable gain?

  • I can’t say, because the script only ran full after the variables were already changed to sweep (8000) and now I no longer have permission to rotate it again in order to make such a test and cure my curiosity.

  • It’s a shame, was in the kick if this was effective (which is different from efficiency and effective) or not with respect to doubt.

  • Yes, it’s a shame. And in the question, I thought efficiency and wrote efficiency. I lose!

2

I’ll try to explain as best I can:

Initially in relation to difference:

  • char: corresponds to a type of data of fixed size, used for armezanar values that have well defined size (CPF, Telephone...). If the size is fixed just when declaring char(12) and you store only one letter, then you will be stored 11 more blank spaces.

  • sweep: corresponds to a data type of variable size, used for values that have not been set the size (people names, URL’s...). If you define varchar(12) and store only one character only this character will be saved.

    The data type VARCHAR and CHAR store the data in the ASCII standard and use 1 byte or 8 bits to represent a character. The maximum size of is 8,000 bytes.

  • nchar: Equal to char but stores data in the Unicode standard using the 1 byte representation for each character thus using a 16-bit representation.

  • nvarchar: You mean, Variable-length Character string, store the data in Unicode and use 2 bytes or 16 bits to represent a character. Soon if you use nvarchar(12) you will be allocated 24 bytes for this data.

On the storage capacity:

The varchar can be as big as the database page.

A data page is the location where your data is stored, i.e., when I create a table and insert records into that table these records are allocated into data pages in SQL Server.

The size of a table page is 8196 bytes, and a row in a table cannot be more than 8,060 characters. This in turn limits the maximum size of a VARCHAR to 8,000 bytes.

The use of MAX indicates that these data types (nchar or nvarchar) can store a value greater than their respective limits: 8000 and 4000 bytes. varchar(max) stores 2GB and is allocated in a different place from the data page.

VARCHAR (MAX): indicates that the maximum storage size for the VARCHAR data type is 2 31 bytes ( 2 GB)

NVARCHAR (MAX): indicates that the maximum storage size for NVARCHAR data type is 2 31 bytes ( 2 GB)

General summary:

nvarchar() is twice the space of a sweep(), and nvarchar(max) is twice the space of a nvarchar(), if you always use the max, you may end up losing bank performance, by the need to always have to allocate an unnecessary amount of storage.

There are other questions here at Sopt about this:

What is the advantage of using CHAR instead of VARCHAR?

  • 1

    Did you read the question? Do you think it answers what was asked?

  • sorry, I ended up clicking reply without before I finished answering in fact

  • 1

    Thank you @Cedrics. Your reply also helped me better understand the impact of the practice I was doing.

  • 1

    I didn’t quite understand this: "if you always use max, you may end up losing bank performance, by the need to always have to allocate an unnecessary amount of storage." - but if it is variable, why would it allocate something unnecessary? Or did you mean that IF the data is greater than 8000, it will occur? And then I ask, in which case it’s better to store what you’ve been through, or lose the information?

  • "but if it is variable, why would it allocate something unnecessary?" - When creating the new data instance in the field, the storage space is allocated to the memory to receive the value input. In case I’m saying something wrong I hope you correct me but that’s how I learned

Browser other questions tagged

You are not signed in. Login or sign up in order to post.