Working with ID limit on large volumes

Asked

Viewed 747 times

5

I have in mind some questions about large volumes of data on MYSQL, concerning the limit of ID. I will give as an example ID guy tinyint to simplify the explanation. In the example the field tinyint accepted 127 records.

Assuming you enter 100 new records, this creates Ids of 1 to 100. Eventually the 50 first records will be deleted, so I’ll just have 50 records with Ids of 50 to 100 and allocation for more 27 records, which results in 77 records. That is, we have 77 records only, in a table that should receive up to 127 records.

I know that the MYSQL does not recede to occupy the vacant positions, the question is another. It is not uncommon to have to remove entries, and with this one loses a space in the table.


  1. How would be an efficient way to allocate the actual amount of records without losing the Ids reserved?

  2. Use bigint in place of int which is the most commonly used, would only increase the time to reach the limit?

Disregard any error of the above calculations :)

  • In my opinion, your doubt is somewhat mistaken... When you set a field type to a column, you are not setting the amount of record in the table, the table may contain 'N' records. What Voce is setting is the size of the column type, in case the tinyint goes up to 127. Your table would support as many records as you want if this field is not PK. what is not supported is the PK field with duplicate record, thus limiting its table. What you can do is as your bank grows change to int and later bigint....

  • @Michelsimões, when quoting ID, is understood that it is the PK of the table

  • The answer may include alternatives to int and to the bigint?

  • @Gypsy omorrisonmendez, but the use of bigint would only increase the 'time' to reach the input limit, correct?

  • 1

    @Papacharlie Yes. I will write an answer with another alternative that you haven’t imagined yet.

  • 1

    Actually, I was. The @Flaviogranato response was what I was going to answer.

  • @Ciganomorrisonmendez, I made a comment in response. Proceeds the risk of collision?

  • 1

    Not. Look at this.

Show 3 more comments

3 answers

6


I don’t share the sympathy of UUID. I think it is useful in some cases but it has to have a reason to use it and I don’t think that’s the case. UUID has several problems that are not the case here, but only space consumption is already one of them.

I would use the BIGINT same. I doubt you’ll have any problems with this. Physically you won’t be able to have this amount of rows on the table even in the next few decades. If it keeps deleting many lines I think it would be worth having a compression mechanism of the Ids that runs periodically. I doubt I need it, but in need renumber all lines making the proper adjustments where the Ids were used in other tables. Yes, of course, this is not simple and in some cases compensates, in others not. Remembering that if one day you have to change the type of column, you will have similar work.

I reinforce that I doubt that it will be necessary to do all this mechanism, but if you want to avoid this, I think it is better to create a combination of columns to increase the maximum number, provided we do this in advance. You can also use a type of string, what to do with UUID, and control the increment by the database itself.

If you need to save space even there, besides using the smallest type possible, if you have many deletions, and the deleted Ids need to be repurposed soon not to burst, I would create a free-list reusing lines instead of removing them. Each time you have to remove, mark the row as removed (a proper column for this that will be used to filter) and place the ID in a table that stores a list of available Ids. When entering, if you have an ID in this list, remove it from the list and reuse the line by wiping it completely and recording the new data. You have to know how to do this the right way especially in competitive environments, as is common.

None is good, but there are some alternatives to what can be done, besides UUID, which is still an option.

  • Great answer. Remembering that if the option is a free-list, there is a very clear commitment. You will have more processing steps to get an ID. You really need to check the need.

  • Big answer. You have to know how to do it right[..], no doubt. I will look for more references, but it sounded very nice this scheme. Thank you big.

5

I indicate the use of UUID since it is unique.

We usually sort the data by other columns and not by id. To know who comes before who is ordered by the date of creation only if it has and if necessary.

  • As far as I read, the UUID is created based on MAC, date and time, would be difficult a collision, but not impossible. I even considered this alternative.

  • Would you be using multiprocessing and a very high volume of data? I think this would be the case where I would consider something when the UUID. My example is based on my xml file processing system that we receive from vendors, we have 30 threads and we make inserts every 5 ms, UUID has guaranteed us very well...

  • I understand, I did not criticize your answer, I upvote on it. What I mentioned was the possibility of collision, which would be solved by checking the UUID.

  • 2

    Don’t take it this way, I only added indicating a successful implementation case that I indicated.

3

If the ID you refer to is a key surrogate (https://en.wikipedia.org/wiki/Surrogate_key), that is, a value that exists only to give identity to each record, having no relation to its application, so you should not worry about it. Put on some bigint and relax. Don’t worry about repurposing Ids that were lost by exclusions, this will give much more work for benefit 0.

Obviously, everything has a limit, but the unsigned bigint of MYSQL goes up to 18446744073709551615 (https://dev.mysql.com/doc/refman/5.5/en/integer-types.html). It is a lot of record in the table to achieve this and, if its application reaches, then it will be necessary another way to solve, perhaps another table with the same structure as the other 18446744073709551615 records.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.