how to perform a SELECT COUNT faster!

Question

how to perform a SELECT COUNT faster!

Asked 8 years, 5 months ago

Viewed 266 times

2

I need to perform a query to know a total of operators I have. However the table has almost 18 million records, and it is taking a long time to return almost 3 minutes, and this is because I am performing direct test in the bank.

The problem will be bigger when this table is complete, more than 150 million, yes it will take an eternity, will have some way to carry out this query faster?

SELECT *, COUNT(operadora) AS total_op
FROM telefones
GROUP BY operadora

Table structure

CREATE TABLE IF NOT EXISTS `telefones` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `numero` varchar(30) NOT NULL,
  `ddd` int(3) NOT NULL,
  `operadora` varchar(15) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=19606524 ;

--
-- Extraindo dados da tabela `telefones`
--

INSERT INTO `telefones` (`id`, `numero`, `ddd`, `operadora`) VALUES
(31, '11920160030', 11, 'OI - Celular'),
(32, '11920160031', 11, 'TIM - Celular'),
(33, '11920160032', 11, 'Claro - Celular'),
(34, '11920160033', 11, 'VIVO - Celular'),
(35, '11920160034', 11, 'Claro - Celular'),
(36, '11920160035', 11, 'Claro - Celular'),
(37, '11920160036', 11, 'Claro - Celular'),
(38, '11920160037', 11, 'Claro - Celular'),
(39, '11920160038', 11, 'Claro - Celular'),
(40, '11920160039', 11, 'Claro - Celular'),
(41, '11920160040', 11, 'Claro - Celular'),
(42, '11920160041', 11, 'Claro - Celular'),
(43, '11920160042', 11, 'Claro - Celular'),
(44, '11920160043', 11, 'Claro - Celular'),
(45, '11920160044', 11, 'Claro - Celular'),
(46, '11920160045', 11, 'Claro - Celular'),
(47, '11920160046', 11, 'Claro - Celular'),
(48, '11920160047', 11, 'Claro - Celular'),
(49, '11920160048', 11, 'Claro - Celular'),
(50, '11920160049', 11, 'Claro - Celular'),
(51, '11920160050', 11, 'Claro - Celular'),
(52, '11920160051', 11, 'Claro - Celular'),
(53, '11920160052', 11, 'Claro - Celular'),
(54, '11920160053', 11, 'Claro - Celular'),
(55, '11920160054', 11, 'Claro - Celular'),
(56, '11920160055', 11, 'Claro - Celular'),
(57, '11920160056', 11, 'Claro - Celular'),
(58, '11920160057', 11, 'Claro - Celular'),
(59, '11920160058', 11, 'Claro - Celular'),
(60, '11920160059', 11, 'Claro - Celular'),
(61, '11920160060', 11, 'Claro - Celular'),
(62, '11920160061', 11, 'Claro - Celular'),
(63, '11920160062', 11, 'Claro - Celular'),
(64, '11920160063', 11, 'Claro - Celular'),
(65, '11920160064', 11, 'Claro - Celular'),
(66, '11920160065', 11, 'Claro - Celular'),
(67, '11920160066', 11, 'Claro - Celular'),
(68, '11920160067', 11, 'Claro - Celular'),
(69, '11920160068', 11, 'Claro - Celular'),
(70, '11920160069', 11, 'Claro - Celular'),
(71, '11920160070', 11, 'Claro - Celular'),
(72, '11920160071', 11, 'Claro - Celular'),
(73, '11920160072', 11, 'Claro - Celular'),
(74, '11920160073', 11, 'Claro - Celular'),
(75, '11920160074', 11, 'Claro - Celular'),
(76, '11920160075', 11, 'Claro - Celular'),
(77, '11920160076', 11, 'Claro - Celular'),
(78, '11920160077', 11, 'Claro - Celular'),
(79, '11920160078', 11, 'Claro - Celular'),
(80, '11920160079', 11, 'Claro - Celular'),
(81, '11920160080', 11, 'Claro - Celular'),
(82, '11920160081', 11, 'Claro - Celular'),
(83, '11920160082', 11, 'Claro - Celular'),
(84, '11920160083', 11, 'Claro - Celular'),
(85, '11920160084', 11, 'Claro - Celular'),
(86, '11920160085', 11, 'Claro - Celular'),
(87, '11920160086', 11, 'Claro - Celular'),
(88, '11920160087', 11, 'Claro - Celular'),
(89, '11920160088', 11, 'Claro - Celular'),
(90, '11920160089', 11, 'Claro - Celular');

Is there any index by the column operadora?

– Artur Trapp

2017/03/09 at 19:07
@Arturtrapp No , this direct with the name

– Marcos Paulo

2017/03/09 at 19:09
'Cause you’re grouping around operadora? Also, try creating an index if you don’t have it, I’ve had occasions where I was able to make queries that would take three minutes to go down to less than eight seconds

– Artur Trapp

2017/03/09 at 19:12
1

Why select *, count()? It wasn’t just Count?

– Jéf Bueno

2017/03/09 at 19:13
@Arturtrapp - The Customer imports everything from a TXT, and plays on the bench, and so everything is grouped together! If I change to include Indice, I think in the import it takes a long time, will double!

– Marcos Paulo

2017/03/09 at 19:14
@Arturtrapp The File reaches him in this way - 11999999999;Nextel - Cellular

– Marcos Paulo

2017/03/09 at 19:16
@jbueno I need to list the name of operators and the amount of records in each one. I can do this without having to set a variable to it?

– Marcos Paulo

2017/03/09 at 19:18
@jbueno without the GROUP by it will add up everything, if you do grouped it will show a line for each operator.

– Guilherme Nascimento

2017/03/09 at 19:48
@Marcospaulo I think I understand what you need, I really can’t test 150 million, but generally engines like innoDB have an internal cache, also have myISAM that they say have better performance because it is more simple. But I can’t say much.

– Guilherme Nascimento

2017/03/09 at 19:49
@Marcospaulo, what is the impediment for you to put an input in the carrier column? Note that Inserts will lose a little performance, but that will depend a lot on the way you are doing it. If you do an SQL transaction the difference will be much smaller.

– jlHertel

2017/03/09 at 19:51
@Guilhermenascimento, what killed me was this amount, I’ll see what I can, the worst I’ll have to count by DDD too, then it weighs even more!

– Marcos Paulo

2017/03/09 at 19:52
@jlHertel I will try to change here, the import system and another programmer, he created a program to play directly in the BD, I think I will have to create a tab in the system where will be imported the TXT, but have almost 500 megas TXT

– Marcos Paulo

2017/03/09 at 19:54
@Marcospaulo, the size of the TXT does not matter. The important thing is to ensure that the data are of the smallest possible type and are correctly indexed. An alternative solution that you can try is to create one more column with the operator using a numerical value, because the numbers are much smaller and much better at the time of indexing.

– jlHertel

2017/03/09 at 19:56

Show 8 more comments

1 answer

Browser other questions tagged php mysql

You are not signed in. Login or sign up in order to post.

by Inkeliz • **20,671** points · Answer 1 · 2017-03-09T19:51:38+00:00

There’s not much you can do, but you can try to find out what the problem is, with the EXPLAIN.

First, Mysql always uses right, your table has no indexing of this column because it does not create?

Mysql has three choices:

Utilizes the index.
If you cannot use the index _{(not to exist is one of the reasons)}:
- Will use a "Temporary table".
- Will use a "filesort".

You can force the "Temporary table" using the SELECT SQL_SMALL_RESULT ... or force the "filesort" using the SELECT SQL_BIG_RESULT ... as stated in the manual of SELECT.

You can create index using the CREATE INDEX ..., as stated in the manual or using the ALTER TABLE, example:

CREATE INDEX nome_do_index on nome_da_tabela(nome_da_coluna) using HASH;

The reason for using the HASH instead of BTree is because you will not compare "carrier > operator", but will compare "carrier != operator" or "carrier = operator".

One of the ways to get around the problem is to create a table "Account operator" and update it frequently, ie a "CACHE". That way instead of always calculating just do SELECT contagem WHERE operadora = 'oi', as it already is prepaid the return will be much faster.