Optimize 20 Million registry mysql database

Asked

Viewed 439 times

0

Create code:


CREATE TABLE `tb_empresas_ativas` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cpf` varchar(255) DEFAULT NULL,
  `nome_cliente` varchar(255) DEFAULT NULL,
  `razao_social` varchar(355) DEFAULT NULL,
  `nome_fantasia` varchar(355) DEFAULT NULL,
  `situacao` int(11) DEFAULT NULL,
  `data_situacao` varchar(255) DEFAULT NULL,
  `motivo_situa` int(11) DEFAULT NULL,
  `cod_natureza_juridica` int(11) DEFAULT NULL,
  `data_inicio_atividade` varchar(255) DEFAULT NULL,
  `tipo_registro` varchar(255) DEFAULT NULL,
  `data_insert` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `indexid` (`id`,`cpf`)
) ENGINE=InnoDB AUTO_INCREMENT=35085341 DEFAULT CHARSET=latin1;

I have a client table in mysql with 20 million records. When I count how many clients per state, per city or even consult a single Cpf the query takes a long time. Follow the schema of the table, remembering that it is already with index. Will I turn this table one for many would solve the problem ?!

id                int(11)
cpf               varchar(255)
nome_cliente      varchar(255)
razao_social      varchar(355)
nome_fantasia     varchar(355)
situacao          int(11)
data_situacao     varchar(255)
motivo_situa      int(11)
cod_nat_jurid     int(11)
data_in_ativ      varchar(255)
tipo_registro     varchar(255)
data_insert       timestamp

Or some other idea of how I can optimize the query of this data ? The database change is also feasible and would have how to do. Mongodb, sqlserver...

I am using php7 and mysql version 5. Thanks in advance :)

  • 1

    What are the queries that are made (and that are taking time)? Which ones have an index (I think it would be convenient to post the CREATE code since it says the index)? When you run a EXPLAIN and a SHOW PROFILE (and INFORMATION_SCHEMA.PROFILING) what are the results?

  • @Inkeliz Posted the create code. Ex: SELECT Cpf, name_client FROM tb_business_active WHERE Cpf = '00008784884' is already a slow-moving query.

  • 1

    What is the meaning of this field with this size? Or Cpf does not have the current connotation and widely used by Brazilians? I don’t understand nome_cliente int(11) DEFAULT NULL, if it is a name why int? Idem data_situation, etc.

  • I posted the wrong table create, I will edit here.

  • 1

    Maybe the problem is the order of UNIQUE, since it is running only with cpf. I would test two things: change the order of UNIQUE KEY \indexid` (`id`,`Cpf`)paraUNIQUE KEY`indexid` (`Cpf`, `id`)ou cria um novo index somente para oCpf, _oid` already has the PK index_.

  • You should only create indexes to speed up frequent queries, some queries will take , a solution can be to create consolidated tables and load them in idle machine time. What indexes do the tables have ? What kind of queries are made ? How often ? Pq Cpf is varchar(255) cnpj goes up to 14 without editing.

Show 1 more comment

1 answer

1

You can partition tables horizontally, this consists of due tables by indexes, for example date, Cpf, id. Thus there will be several tables subdivided by the index decreasing the consultation time. I don’t know the complete process for partitioning, but I already have with it, so I’ll leave some links.

About the partition of tables: https://docs.microsoft.com/pt-br/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2017

An interface I used a lot for partition, is called Citus: https://docs.citusdata.com/en/v8.2/

I hope I’ve helped.

  • Thanks Vinicius I’ll give this guy a read :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.