Allocation Space for Mysql Fields

Asked

Viewed 776 times

5

First I create the model in my application, then the Entity Framework generates the SQL for the table creation. The first statement generates a column with the type varchar(20), the second generates as longtext.

Example

[StringLength(20)]
public string Codigo { get; set; }

public string CodigoDois { get; set; }

Doubts

Is there any difference between these two statements (talking about space allocation in the database)? Even if they store the same value as "test" that has 6 characters?

If I know that a field has a variation of its length, let’s say that between 10-15 characters, it would be the best decision to limit to the maximum length or leave "no limit" (talking about space allocation in the database)?

1 answer

2


When the Entity Framework generates the tables in your database it checks the types of each field, in the case of the STRING type when you specify the size it makes the same specification for the database with its corresponding type.

In the case of your

[StringLength(20)]
public string Codigo { get; set; }

The corresponding in Mysql is varchar(20), but when the same string type is declared without a fixed size the Entity Framework allocates as much as possible to this type in the database as in the case of Mysql and longtext.

BLOB type columns like LONGTEXT are inherently variable length and occupy almost no storage when not used. The space required by themselves is not affected in the case of a NULL value, in the case of a use like text "teste" the allocation and set in the size of the string passed.

Advantages / disadvantages of Blobs vs. Varchars

All comments in this paragraph concerning VARCHAR type are valid for CHAR type as well. Each comment ends with BLOB + or VARCHAR + tag to indicate which data type is best.


  • You know maximum length of your data?

With Varchars you need to declare the maximum length of the string. With blobs you don’t have to worry about that. BLOB +


  • You need to store very long chains ?

A single VARCHAR column is restricted to 32K bytes (that is, about 10,000 Unicode characters) . The maximum size blob is (according to the Usage Guide);

  • page size 1kb => 64 Mb
  • page size 2kb => 512 Mb
  • 4 KB page size => 4 Gb
  • 8 KB page size => 32 Gb

BLOB +


  • You need to store many long text columns in the single table?

The total line length (uncompressed) is restricted to 64 k. Varchars are stored in line directly, so that you cannot store many long strings in a row. Blobs are represented by their blob-id, so it uses only 8 bytes of maximum 64K. BLOB +


  • You want to minimize the call between client and server ?

VARCHAR data is fetched along with other line data in a search operation and typically multiple lines are sent over the network at the same time. Each single blob needs to do extra search operation open/fetch. SWEEP +


  • You want to minimize the amount of data transferred between client and server ?

The advantage of blobs is that when fetching the line you get only blob-id, so you can decide whether to fetch data blob or not. In older versions of Interbase there was a problem that Varchars were sent over the net at the full declared length. This problem has already been fixed in Firebird 1.5 and Interbase 6.5. draw (BLOB + to older versions of the server )


  • You want to minimize the space used?

Varchars are compressed RLE (actually whole line are compressed, except blobs). At most 128 bytes can be compressed to 2 bytes. This means that even empty VARCHAR (32000) will occupy 500 + 2 bytes.

Blobs are not compressed, but empty (i.e., null) blob will only occupy 8 bytes of blob-id (and will later be compressed RLE). Non-empty blob can be stored on the same page as other line data (if it fits), or on the separate page. Small blob that fits on the data page has overload of about 40 bytes (or a little more). Large blob has the same 40 bytes overload on the data page, plus 28 bytes overload on each blob page (30 bytes on the first). A blob page cannot contain more than one blob (i.e., blob pages are not shared as data pages). For example. 4K page size, if you store 5K blob, two blob type pages will be allocated, which means you lose 3K of space! In other words - the larger page size, the greater probability that small blobs will fit on the data page, but also more space wasted if separate blob pages are required for large blobs. VARCHAR+ (except Varchars with extremely large declared length, or tables with many NULL blobs)


  • You need table with extremely large number of lines ?

Each line is identified by DB_KEY , which is a value of 64 bits , 32 bits , where it represents relation ID and 32 bits are used to locate the line . Theoretical mode maximum number of lines in a table is 2 32 (but for various reasons the true maximum is even smaller ) . Blob -ids are assigned from the same address space as Db_keys , this means that the more blobs in the table minus Db_keys will remain to face queues. On the other hand , when the stored rows are wide (for example if they contain long Varchars ), then fewer lines fit on the data page and many DB_KEY values will remain unasigned anyway. sweep + ?


  • You want a good performance ?

Because large blobs are stored outside data pages, they increase "density" of lines in data pages and efficiency , thus, cache ( reduce the number of operations I / O during the search). BLOB +


  • You need to run the search on the content of the text columns ?

In the VARCHAR column you can use operators such as '= ', '>', among them, from () , case sensitive as and starting, case insensitive CONTAINING . In most cases index can be used to speed up the search up. Blobs cannot be indexed , and you are restricted to LIKE , starting, and containing operators. You cannot directly compare blobs with operators '= ', >' etc. ( unless you use UDF) so that you cannot, for example, join tables in Blob fields. SWEEP +


  • You want to search content of these texts with CONTAINING ?

Containig can be used to execute the case-insensitive search for content in the VARCHAR field. (No use of index) Because it is not possible to define grouping order for the blob columns, you cannot use the insensitive search fully case with national characters in blob columns (only the lower half of the character set will be case insensitive) . ( Alternatively, you can use UDF ). Firebird 2 already allows you to define text grouping (and binaries) columns. SWEEP +


  • You need uppercase text column content ?

You can use the built-in UPPER() function in VARCHAR , but not in the blob . (Also CAST, MIN , MAX cannot be used with blobs ) SWEEP +


Cannot sort by column blob . (E GROUP BY, DISTINCT , UNION, JOIN ON ) It is not possible to concatenate blob columns . SWEEP +


There is no built-in conversion function (CAST ) to convert blob to VARCHAR or VARCHAR to Blob. ( But it is possible to write UDF for this purpose . ) Since Firebird 1.5 it is possible to use builtin SUBSTRING function to convert blob to VARCHAR (but FROM and clauses cannot exceed 32K). draw


Cannot assign value to blob directly in the SQL command , for example. Insert tab values ( Myblob ) ( 'abc' ); ( But it is possible to use UDF to convert string to blob ) . SWEEP +

Firebird - 0.9.4 already has this feature draw


  • You need good security in these text columns ?

To recover table data, you need to have granted SELECT privilege. To recover blob, you need to know only blob -id ( stored in table ), but Firebird / Interbase will not check if you have any blob table rights belongs. This means that everyone who knows or guesses blob -id right can read the blob without any rights at the table. ( You can try it with ISQL and BLOBDUMP command . ) SWEEP +


You can see more details in the links below

Reference 1

Reference 2

Reference 3

Reference 4

  • 1

    The second link ta giving a very crazy biziu, maybe it would be more interesting to post the direct link to OS in English.

  • Whoa, I didn’t see that .

Browser other questions tagged

You are not signed in. Login or sign up in order to post.