Better BD structure with large number of data and filterable columns

Question

Better BD structure with large number of data and filterable columns

Asked 8 years, 9 months ago

Viewed 217 times

5

I have a Mysql database with a table that is taking too long to complete a query. I would like to know what would be the best database structure indicated for a table with many fields that are capable of filters, have more than 5 million records and that is fed constantly.

All filterable fields are indexed.

The VM the database is in has the attributes:

Sistema Operacional CentOS 6.X 64 bits
Mysql 5.7
CPU 2 x 2 Cores
6 GB Memória

Currently the structure that exists is as follows:

 CREATE TABLE IF NOT EXISTS `vendas` (
  `identificador` int(11) NOT NULL,
  `doc` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `emissao` date NOT NULL,
  `recno` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `filial` int(11) DEFAULT NULL,
  `filial_origem` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `cliente` int(11) NOT NULL,
  `dap` bigint(15) DEFAULT NULL,
  `referencia` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `grupo` int(11) DEFAULT NULL,
  `prefixo` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `descricao` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `data_hora` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `quantidade` int(11) DEFAULT NULL,
  `valor_unitario` double DEFAULT NULL,
  `total` double DEFAULT NULL,
  `desconto` double DEFAULT NULL,
  `custo_unitario` double DEFAULT NULL,
  `custo_medio` double DEFAULT NULL,
  `pedido` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `posicao` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `tabela` int(11) DEFAULT NULL,
  `atividade` int(11) DEFAULT NULL,
  `risco` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `uf` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  `municipio` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `regiao` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `fornecedor` int(11) DEFAULT NULL,
  `vendedor` int(11) DEFAULT NULL,
  `vendedor2` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
  `aplicador` int(11) DEFAULT NULL,
  `tipo` varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
  `motivo` varchar(5) COLLATE utf8_unicode_ci DEFAULT NULL,
  `pm` double(15,2) DEFAULT NULL,
  PRIMARY KEY (`identificador`,`emissao`),
  KEY `filial` (`filial`),
  KEY `dap` (`dap`),
  KEY `referencia` (`referencia`),
  KEY `grupo` (`grupo`),
  KEY `tabela` (`tabela`),
  KEY `atividade` (`atividade`),
  KEY `municipio` (`municipio`),
  KEY `fornecedor` (`fornecedor`),
  KEY `vendedor` (`vendedor`),
  KEY `aplicador` (`aplicador`),
  KEY `flaged` (`flaged`),
  KEY `tipo` (`tipo`),
  KEY `cliente` (`cliente`),
  KEY `emissao` (`emissao`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

This query example is taking about 11 seconds, which is unsatisfactory:

select CONCAT_WS("-",YEAR(emissao),LPAD(MONTH(emissao),2,'0')),
sum(total) as faturamento
from `vendas` as `operacao` 
where (`emissao` >= '2009-01-01')
group by 1
order by 1

What I could improve in this structure so that queries like the one in the example bring results with less than 1 second?

------------------ UPDATE

Result of the query below (with annual partitioning):

explain select CONCAT_WS("-",YEAR(emissao),LPAD(MONTH(emissao),2,'0')),
sum(total) as faturamento
from `vendas` as `operacao` 
where (`emissao` >= '2009-01-01')
group by 1
order by 1

The result of the query show index from vendas:

1

Did you ever look at the query execution plan? can you tell if the most wanted items are always the latest? if yes you can cache them. Ever thought of partitioning the table?

– rray

2016/11/14 at 12:00
2

Updated statistics ? Use of indexes ? https://dev.mysql.com/doc/refman/5.7/en/index-statistics.html https://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

– Motta

2016/11/14 at 13:39
@rray now updated with a few more details using the explain and show index. Yes, I have tried using partitioning, but it has not given any significant improvement.

– KQUEIROZ

2016/11/14 at 17:10
1

I recommend reading: How to improve reading performance of an SQL database?

– Marllon Nasser

2017/02/14 at 12:34

1 answer

Browser other questions tagged mysql sql database índices

You are not signed in. Login or sign up in order to post.

by jlHertel • **4,733** points · Answer 1 · 2017-02-14T11:01:53+00:00

As far as I could tell EXPLAIN, your query is not using any data, as you are forcing the use of functions for both sorting and grouping.

An alternative to your query would be:

SELECT 
  CONCAT_WS("-",YEAR(emissao), LPAD(MONTH(emissao),2,'0')),
  SUM(total) AS faturamento
FROM `vendas` AS `operacao` 
WHERE (`emissao` >= '2009-01-01')
GROUP BY emissao

This query will group the records per day, generating a slightly different result than you have. If you want to group by month and year as your query already did, use this query:

SELECT 
  CONCAT_WS("-",YEAR(emissao), LPAD(MONTH(emissao),2,'0')),
  SUM(total) AS faturamento
FROM `vendas` AS `operacao` 
WHERE (`emissao` >= '2009-01-01')
GROUP BY YEAR(emissao), MONTH(emissao)

Note 1: Instead of using the number 1, indicating the first query field, I am using the field name, ensuring the use of the index.

Note 2: How you are using the command GROUP BY in the column emissao, no need to use command ORDER BY because the grouping already guarantees the order of the records.