how to perform a SELECT COUNT faster!

Asked

Viewed 266 times

2

I need to perform a query to know a total of operators I have. However the table has almost 18 million records, and it is taking a long time to return almost 3 minutes, and this is because I am performing direct test in the bank.

The problem will be bigger when this table is complete, more than 150 million, yes it will take an eternity, will have some way to carry out this query faster?

SELECT *, COUNT(operadora) AS total_op
FROM telefones
GROUP BY operadora

Table structure

CREATE TABLE IF NOT EXISTS `telefones` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `numero` varchar(30) NOT NULL,
  `ddd` int(3) NOT NULL,
  `operadora` varchar(15) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=19606524 ;

--
-- Extraindo dados da tabela `telefones`
--

INSERT INTO `telefones` (`id`, `numero`, `ddd`, `operadora`) VALUES
(31, '11920160030', 11, 'OI - Celular'),
(32, '11920160031', 11, 'TIM - Celular'),
(33, '11920160032', 11, 'Claro - Celular'),
(34, '11920160033', 11, 'VIVO - Celular'),
(35, '11920160034', 11, 'Claro - Celular'),
(36, '11920160035', 11, 'Claro - Celular'),
(37, '11920160036', 11, 'Claro - Celular'),
(38, '11920160037', 11, 'Claro - Celular'),
(39, '11920160038', 11, 'Claro - Celular'),
(40, '11920160039', 11, 'Claro - Celular'),
(41, '11920160040', 11, 'Claro - Celular'),
(42, '11920160041', 11, 'Claro - Celular'),
(43, '11920160042', 11, 'Claro - Celular'),
(44, '11920160043', 11, 'Claro - Celular'),
(45, '11920160044', 11, 'Claro - Celular'),
(46, '11920160045', 11, 'Claro - Celular'),
(47, '11920160046', 11, 'Claro - Celular'),
(48, '11920160047', 11, 'Claro - Celular'),
(49, '11920160048', 11, 'Claro - Celular'),
(50, '11920160049', 11, 'Claro - Celular'),
(51, '11920160050', 11, 'Claro - Celular'),
(52, '11920160051', 11, 'Claro - Celular'),
(53, '11920160052', 11, 'Claro - Celular'),
(54, '11920160053', 11, 'Claro - Celular'),
(55, '11920160054', 11, 'Claro - Celular'),
(56, '11920160055', 11, 'Claro - Celular'),
(57, '11920160056', 11, 'Claro - Celular'),
(58, '11920160057', 11, 'Claro - Celular'),
(59, '11920160058', 11, 'Claro - Celular'),
(60, '11920160059', 11, 'Claro - Celular'),
(61, '11920160060', 11, 'Claro - Celular'),
(62, '11920160061', 11, 'Claro - Celular'),
(63, '11920160062', 11, 'Claro - Celular'),
(64, '11920160063', 11, 'Claro - Celular'),
(65, '11920160064', 11, 'Claro - Celular'),
(66, '11920160065', 11, 'Claro - Celular'),
(67, '11920160066', 11, 'Claro - Celular'),
(68, '11920160067', 11, 'Claro - Celular'),
(69, '11920160068', 11, 'Claro - Celular'),
(70, '11920160069', 11, 'Claro - Celular'),
(71, '11920160070', 11, 'Claro - Celular'),
(72, '11920160071', 11, 'Claro - Celular'),
(73, '11920160072', 11, 'Claro - Celular'),
(74, '11920160073', 11, 'Claro - Celular'),
(75, '11920160074', 11, 'Claro - Celular'),
(76, '11920160075', 11, 'Claro - Celular'),
(77, '11920160076', 11, 'Claro - Celular'),
(78, '11920160077', 11, 'Claro - Celular'),
(79, '11920160078', 11, 'Claro - Celular'),
(80, '11920160079', 11, 'Claro - Celular'),
(81, '11920160080', 11, 'Claro - Celular'),
(82, '11920160081', 11, 'Claro - Celular'),
(83, '11920160082', 11, 'Claro - Celular'),
(84, '11920160083', 11, 'Claro - Celular'),
(85, '11920160084', 11, 'Claro - Celular'),
(86, '11920160085', 11, 'Claro - Celular'),
(87, '11920160086', 11, 'Claro - Celular'),
(88, '11920160087', 11, 'Claro - Celular'),
(89, '11920160088', 11, 'Claro - Celular'),
(90, '11920160089', 11, 'Claro - Celular');
  • Is there any index by the column operadora?

  • @Arturtrapp No , this direct with the name

  • 'Cause you’re grouping around operadora? Also, try creating an index if you don’t have it, I’ve had occasions where I was able to make queries that would take three minutes to go down to less than eight seconds

  • 1

    Why select *, count()? It wasn’t just Count?

  • @Arturtrapp - The Customer imports everything from a TXT, and plays on the bench, and so everything is grouped together! If I change to include Indice, I think in the import it takes a long time, will double!

  • @Arturtrapp The File reaches him in this way - 11999999999;Nextel - Cellular

  • @jbueno I need to list the name of operators and the amount of records in each one. I can do this without having to set a variable to it?

  • @jbueno without the GROUP by it will add up everything, if you do grouped it will show a line for each operator.

  • @Marcospaulo I think I understand what you need, I really can’t test 150 million, but generally engines like innoDB have an internal cache, also have myISAM that they say have better performance because it is more simple. But I can’t say much.

  • @Marcospaulo, what is the impediment for you to put an input in the carrier column? Note that Inserts will lose a little performance, but that will depend a lot on the way you are doing it. If you do an SQL transaction the difference will be much smaller.

  • @Guilhermenascimento, what killed me was this amount, I’ll see what I can, the worst I’ll have to count by DDD too, then it weighs even more!

  • @jlHertel I will try to change here, the import system and another programmer, he created a program to play directly in the BD, I think I will have to create a tab in the system where will be imported the TXT, but have almost 500 megas TXT

  • @Marcospaulo, the size of the TXT does not matter. The important thing is to ensure that the data are of the smallest possible type and are correctly indexed. An alternative solution that you can try is to create one more column with the operator using a numerical value, because the numbers are much smaller and much better at the time of indexing.

Show 8 more comments

1 answer

1

There’s not much you can do, but you can try to find out what the problem is, with the EXPLAIN.

First, Mysql always uses right, your table has no indexing of this column because it does not create?

Mysql has three choices:

  • Utilizes the index.

  • If you cannot use the index (not to exist is one of the reasons):

    • Will use a "Temporary table".
    • Will use a "filesort".

You can force the "Temporary table" using the SELECT SQL_SMALL_RESULT ... or force the "filesort" using the SELECT SQL_BIG_RESULT ... as stated in the manual of SELECT.


You can create index using the CREATE INDEX ..., as stated in the manual or using the ALTER TABLE, example:

CREATE INDEX nome_do_index on nome_da_tabela(nome_da_coluna) using HASH;

The reason for using the HASH instead of BTree is because you will not compare "carrier > operator", but will compare "carrier != operator" or "carrier = operator".


One of the ways to get around the problem is to create a table "Account operator" and update it frequently, ie a "CACHE". That way instead of always calculating just do SELECT contagem WHERE operadora = 'oi', as it already is prepaid the return will be much faster.

  • Truth, well thought out, as operators will always be the same, will not appear one so often, I can modify, so I believe it solves 95% of my problem

  • It seems that even indexed columns (I did a test with 9 million records) is a bit time consuming, maybe the best way really is cache.

  • There would be difference in the Count if in place of operadora varchar(15), were operadora int(11)? You know if it influences you in any way?

  • 1

    On Count in particular do not know, but... Difference would be, the INT is always 4 bytes, just as the TINYINT is 1 byte. VARCHAR has no fixed size, because there are characters that are multi-bytes, in the case of VARCHAR(15 CHAR), there are 15 characters, whereas in UTF-8usa from 1 to 4 bytes per letter we have up to 60 bytes. Using TINYINT would save at least 252 MB (18000000 * 14 / 1000000) and at most 1062MB (18000000 * 59 / 1000000). Obviously this should accelerate, in addition to the Dexes being lighter.

  • Thanks for the info

Browser other questions tagged

You are not signed in. Login or sign up in order to post.