Better BD structure with large number of data and filterable columns

Asked

Viewed 217 times

5

I have a Mysql database with a table that is taking too long to complete a query. I would like to know what would be the best database structure indicated for a table with many fields that are capable of filters, have more than 5 million records and that is fed constantly.

All filterable fields are indexed.

The VM the database is in has the attributes:

Sistema Operacional CentOS 6.X 64 bits
Mysql 5.7
CPU 2 x 2 Cores
6 GB Memória

Currently the structure that exists is as follows:

 CREATE TABLE IF NOT EXISTS `vendas` (
  `identificador` int(11) NOT NULL,
  `doc` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `emissao` date NOT NULL,
  `recno` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `filial` int(11) DEFAULT NULL,
  `filial_origem` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `cliente` int(11) NOT NULL,
  `dap` bigint(15) DEFAULT NULL,
  `referencia` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `grupo` int(11) DEFAULT NULL,
  `prefixo` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `descricao` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `data_hora` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `quantidade` int(11) DEFAULT NULL,
  `valor_unitario` double DEFAULT NULL,
  `total` double DEFAULT NULL,
  `desconto` double DEFAULT NULL,
  `custo_unitario` double DEFAULT NULL,
  `custo_medio` double DEFAULT NULL,
  `pedido` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `posicao` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `tabela` int(11) DEFAULT NULL,
  `atividade` int(11) DEFAULT NULL,
  `risco` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `uf` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  `municipio` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `regiao` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `fornecedor` int(11) DEFAULT NULL,
  `vendedor` int(11) DEFAULT NULL,
  `vendedor2` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `aplicador` int(11) DEFAULT NULL,
  `tipo` varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
  `motivo` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `pm` double(15,2) DEFAULT NULL,
  PRIMARY KEY (`identificador`,`emissao`),
  KEY `filial` (`filial`),
  KEY `dap` (`dap`),
  KEY `referencia` (`referencia`),
  KEY `grupo` (`grupo`),
  KEY `tabela` (`tabela`),
  KEY `atividade` (`atividade`),
  KEY `municipio` (`municipio`),
  KEY `fornecedor` (`fornecedor`),
  KEY `vendedor` (`vendedor`),
  KEY `aplicador` (`aplicador`),
  KEY `flaged` (`flaged`),
  KEY `tipo` (`tipo`),
  KEY `cliente` (`cliente`),
  KEY `emissao` (`emissao`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

This query example is taking about 11 seconds, which is unsatisfactory:

select CONCAT_WS("-",YEAR(emissao),LPAD(MONTH(emissao),2,'0')),
sum(total) as faturamento
from `vendas` as `operacao` 
where (`emissao` >= '2009-01-01')
group by 1
order by 1

What I could improve in this structure so that queries like the one in the example bring results with less than 1 second?

------------------ UPDATE

Result of the query below (with annual partitioning):

explain select CONCAT_WS("-",YEAR(emissao),LPAD(MONTH(emissao),2,'0')),
sum(total) as faturamento
from `vendas` as `operacao` 
where (`emissao` >= '2009-01-01')
group by 1
order by 1

inserir a descrição da imagem aqui

The result of the query show index from vendas:

inserir a descrição da imagem aqui

  • 1

    Did you ever look at the query execution plan? can you tell if the most wanted items are always the latest? if yes you can cache them. Ever thought of partitioning the table?

  • 2

    Updated statistics ? Use of indexes ? https://dev.mysql.com/doc/refman/5.7/en/index-statistics.html https://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

  • @rray now updated with a few more details using the explain and show index. Yes, I have tried using partitioning, but it has not given any significant improvement.

  • 1

1 answer

2

As far as I could tell EXPLAIN, your query is not using any data, as you are forcing the use of functions for both sorting and grouping.

An alternative to your query would be:

SELECT 
  CONCAT_WS("-",YEAR(emissao), LPAD(MONTH(emissao),2,'0')),
  SUM(total) AS faturamento
FROM `vendas` AS `operacao` 
WHERE (`emissao` >= '2009-01-01')
GROUP BY emissao

This query will group the records per day, generating a slightly different result than you have. If you want to group by month and year as your query already did, use this query:

SELECT 
  CONCAT_WS("-",YEAR(emissao), LPAD(MONTH(emissao),2,'0')),
  SUM(total) AS faturamento
FROM `vendas` AS `operacao` 
WHERE (`emissao` >= '2009-01-01')
GROUP BY YEAR(emissao), MONTH(emissao)

Note 1: Instead of using the number 1, indicating the first query field, I am using the field name, ensuring the use of the index.

Note 2: How you are using the command GROUP BY in the column emissao, no need to use command ORDER BY because the grouping already guarantees the order of the records.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.