Mysql performance with Innodb in large data mass table

Asked

Viewed 4,278 times

5

I currently have a table with about 6 million records that in turn perform a large amount of I/O operations, so in the design of the project I chose to use the Innodb instead of Myisam in Mysql, after all, the lock would be per page and not per table.

But I have a huge problem, the MAJORITY of the queries made in this table is through a ** date period (datetime). By concept I tried **to share it, but I came across this limitation of Innodb

What do you suggest to improve the performance of these queries? Considering that I have a very large hardware limitation?

Below is the structure of the table.

  CREATE TABLE `sensores` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `equipamento_id` int(11) NOT NULL,
    `data_hora` datetime DEFAULT NULL,
    `valor_primario` float(10,6) DEFAULT NULL,
    `valor_secundario` float(10,6) DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `fk_sensor_equipamento_idx` (`equipamento_id`),
    KEY `data_hora` (`data_hora`),
    CONSTRAINT `fk_sensor_equipamento_idx` FOREIGN KEY (`equipamento_id`) REFERENCES `equipamento` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
  ) ENGINE=InnoDB AUTO_INCREMENT=3515782247 DEFAULT CHARSET=utf8;

The operation consists of numerous "sensors" writing equipment reading information in this table every 15 seconds.

Most querys made are similar to instruction

SELECT * FROM sensores WHERE data_hora BETWEEN ? AND ?
  • Can you detail a little more about the structure of this table and what kind of changes, insertions and exclusions it usually suffers? What kind of queries are made? Without this, it is difficult to give useful answers.

  • Innodb’s great asset are TRANSACTIONS, if you don’t use it, you don’t really have to choose Innodb. Myisam is much faster for consultation.

  • @Victor, I’m sorry for the lack of information, I edited the question and put the table structure.

  • @Havenard, really, but I have a lot of writing too, and their response time is very important. In business it is difficult to tell which operation is more important.

4 answers

5

A simple selection like this should be no seven-headed bug for Mysql to run, even in a table of 6 million records.

You should however make sure that the fields involved in the condition are indexed, in this case the field data_hora, to allow Mysql to perform binary search and much more efficiently.

See if creating the following index the performance improves:

CREATE INDEX `data_hora` ON `sensores` (`data_hora`);
  • In case the Dice is created in a date_time column, would it create an Dice for each "second" ? Do you know how Mysql would behave ?

  • @Mauroalexandre It creates the indexes for the values that are in the table, there is no "for every second" or any other existing time unit. The indexes in general have the format of a B tree.

  • The index is a kind of metatable that contains the pointers of each record organized by the specified field. As they are in order, Mysql has the opportunity to perform binary search in the table.

  • Another feature that you can use is to increase the cache of Innodb, this will cause more information, possibly even the entire table, to be loaded in RAM ready to perform quick searches. If I’m not mistaken this is done by adjusting the size of the innodb_buffer_pool_size in the archive my.cnf. However, it is critical that your server has enough memory to support Mysql.

  • @Mauroalexandre If I add in a table with the indexed date field, three dates, one in 1950, one in 2000 and the other in 2050, the index will contain only 3 entries and not a few billion. So be cool about it.

  • @Victor, right, I understand, when I say one every second is related to the business and behavior of my sensors, suppose I have a record on 01/29/2014 00:00:01, another a second later, 01/29/2014 00:00:02, and so on. If so, I would have an input value per second, do you understand ? I don’t know to what extent the creation of an input in this column would be interesting.

  • @Havenard, thanks for the tip, really won’t be able to get away, I’ll have to do a hardware upgrade. Thank you !

  • @Mauroalexandre The index size is limited by the number of records. The primary index (which indexes by the primary key) already does this. This secondary index would never be higher than the primary.

  • 1

    But it is. You look at the table and see that the records are already naturally organized by date, but you have a brain capable of detecting this pattern, Mysql is not. He’s not smart, he doesn’t realize it as obvious as it is. You have to teach him, and the way to do that is by creating an index. Without the index Mysql will burramente read all table records looking for those that are within the specified period.

Show 4 more comments

4

Short explanation

  • Tune Innodb to allow the table to be in memory
  • Tune Innodb to sync changes every 1 second instead of all the time
  • Rephrase your table. Remove unnecessary indexes, or add new ones

Settings that I recommend you don’t forget to Tunar are innodb_buffer_pool_size (to allow the bank to remain in RAM and reduce I/O), innodb_flush_method (prevent OS from duplicating cache, requires testing) and innodb-flush-log-at-trx-commit. Others can be seen in the reference at the end of this answer.

Long explanation

Partitioning shouldn’t be much help in your case. As your problem is of I/O, the tendency to improve and use SSD instead of HDD, or else minimize access to disk.

As just switching to SSD will leave fast, but still not too fast, you better have enough memory and configure your Mysql/Mariadb to allow the entire table to remain in RAM memory and limit so that the database writes changes on disk at intervals not smaller than every second, because even if the bank is completely in memory is a requirement that there is such a synchronization.

As to use Engine MyISAM, it cannot perform worse than Engine InnoDB when updates and Writes are high. the Engine MEMORY may be useful in some specific cases, but should be used as a last resort, and not infrequently Engine InnoDB can be almost as efficient as the Engine MEMORY if it is well configured.

I know you may have limited hardware, but it’s going to be hard to optimize without having at least enough memory. In that situation, the best you can do is to recommend the following paragraph.

As for reshaping your table, if your mode of use is often only to change only recent records, it is useful to create two tables, and eventually move from recent table to old table. I do this with tables that have much more data than yours and it works great. But of course, this is only useful if you don’t have Updates on old data. This data division is more efficient than using partitioning if it is well planned and allows caching with greater ease.

References you should read

  1. http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb.html
  2. http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html
  3. http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/
  4. https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance
  • Thank you for your contribution. Unfortunately the MEMORY engine is not something possible at the moment, but I thought of using it to create a secondary table and store the last records, which in fact, are the most accessed. You qualify this as a good option ?

  • Mauro, Uniting the database to allow Innodb to be in memory is different from setting its format as MEMORY.

  • @Mauroalexandre edited the answer and made explicit settings and links to reliable and detailed references on how to optimize Innodb for reading and heavy writing. Even if dividing into two tables (a recent one, an old one) is an option, one can optimize to fit in memory should already have acceptable performance and avoid the additional complication of having to synchronize two different tables.

1

  • One thing you can do is follow @Havernard’s suggestion, if not the table has no constant modifications.

  • The other is in your consultation bring just what you really need, nothing SELECT * FROM, and make sure that the field by which you will make the filter is not null

  • can also do the search with paging, pq possibly vc not will need to view hundreds of thousands of records of a only time

  • Along with all this you can index the column, but you would want to see which type of your date field, because the performance may vary according to type (date, datetime or timestamp)

  • The use of "*" did not change performance, I believe that because the amount of columns is insignificant. But indexing is important, and it worries me. If I index the date column, in the DATETIME format, would it create an Indice value for each "second"? This worries me because the Dice grow proportionally the amount of values.

1

Not that this answer exactly answers your question. But, whereas entered records are never changed or deleted and whereas you do time-based searches, then the focus of partitioning is on time.

A very simple way of partitioning time is to start creating tables by period of time. Something like sensores_11_2013, sensores_12_2013, sensores_01_2014, etc..

  • Indeed, I had considered this possibility, to explore the data of the previous periods and to leave only the last ones in the main table. However, the application generates reports from previous periods and consequently it would oblige me to make constant changes in the application’s queries, an action that is impracticable for the business.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.