How to reduce the search time in a table with more than 200,000 records?

Asked

Viewed 2,275 times

4

I’m trying problems to list and search data table with more than 200 thousand records.

I read in researches, that for the search system, the ideal would be to make indexes of type "fulltext" in the fields where will have a search for records.

So I did it this way:

EDITED: I changed the tables creating indexes as suggested in the comments. Indexação de campos - Alterado

I also read that counting records for paging, it would be better to use the function:

SELECT FOUND_ROWS();

I haven’t done the research part yet, but I’m already slow to list the logs according to the user.

I have an administrative area, where there will be several users responsible for the registration. Then the system has to display on the screen all the records that were made by a certain user and with paging.

So I did it this way:

    public function AdmListEmpresas($link, $usercod, $pagina = 1, $ItensPorPagina = 100) {

    // Definir a partir de quantos registro inicia a busca de acordo com a página atual.
    $inicio = ($pagina - 1) * $ItensPorPagina;

    $sql = 'SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS
            `empresas`.`nome`,
            `usuarios`.`nome` AS `usuarionome`,
            `equipes`.`nome` AS `equipenome`,
            `setores01`.`titulo` AS `setor01titulo`,
            `setores02`.`titulo` AS `setor02titulo`,
            `setores03`.`titulo` AS `setor03titulo`,
            `empresas`.`telprincipal`,
            `empresas`.`email`,
            `empresas`.`website`,
            `empresas`.`plantype`,
            `empresas`.`datavencimento`,
            `empresas`.`datacadastro`,
            `empresas`.`ativoinfo`
          FROM
            `' . BDtables['BD_EMPRESAS'] . '` `empresas`
            LEFT JOIN `' . BDtables['BD_USERS'] . '` AS `usuarios` ON `empresas`.`useradmcod` =
          `usuarios`.`cod`
            LEFT JOIN `' . BDtables['BD_USERS'] . '` AS `equipes` ON `empresas`.`codequiperesp` =
          `equipes`.`cod`
            LEFT JOIN `' . BDtables['BD_SETORES'] . '` AS `setores01` ON `setores01`.`cod` =
          `empresas`.`codsetor1`
            LEFT JOIN `' . BDtables['BD_SETORES'] . '` AS `setores02` ON `setores02`.`cod` =
          `empresas`.`codsetor2`
            LEFT JOIN `' . BDtables['BD_SETORES'] . '` AS `setores03` ON `setores03`.`cod` =
          `empresas`.`codsetor3`
          WHERE
            (`empresas`.`ativouser` = "1" AND
            `empresas`.`ativosys` = "1" AND
            `empresas`.`useradmcod` = "'.$usercod.'") OR
            (`empresas`.`codequiperesp` = "'.$usercod.'") OR
            (`empresas`.`usersfunccods` LIKE "%'.$usercod.'%")
           ORDER BY
            `empresas`.`useradmcod` LIMIT '.$inicio.', '.$ItensPorPagina.';';

    $sqlcount = 'SELECT FOUND_ROWS() as "rowcount";';

    $result = $this->conn->query($sql);
    $numrows = $this->conn->query($sqlcount);
    $numrows = $numrows->fetch_array(MYSQLI_ASSOC);

    if ($result->num_rows >= 1) {
        while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
            $listaarray[] = ['empresa' => $row['nome'],
                'usuarionome' => $row['usuarionome'],
                'equipenome' => $row['equipenome'],
                'setor01titulo' => $row['setor01titulo'],
                'setor02titulo' => $row['setor02titulo'],
                'setor03titulo' => $row['setor03titulo'],
                'telprincipal' => $row['telprincipal'],
                'email' => $row['email'],
                'website' => $row['website'],
                'plantype' => $row['plantype'],
                'datavencimento' => $row['datavencimento'],
                'datacadastro' => $row['datacadastro'],
                'ativoinfo' => $row['ativoinfo']
            ];
        }
    }

    if (empty($listaarray)) { $listaarray = false; }

    // Array da Paginação
    $paginacaoarray = $this->Paginacao($link, $numrows['rowcount'], $pagina, $ItensPorPagina);

    return array('paginacao' => $paginacaoarray,'lista' => $listaarray);
}

The table BD_EMPRESAS Has the record of all companies with more than 200 thousand records.

The table BD_USERS It may also contain more than 200,000 records, and in the listing of companies, you may have to do a search between users' Ids to search for each one’s name.

The table BD_SETORES It will have a small table of 300 records, and each company will have up to 3 sectors, where in the search it should get the name of each sector according to the ID.

In WHERE in the query, you get the logged-in user code, and search it between 3 table fields where it can be the administrator, system team, or company employee. Then it is made a search of all companies that that user participates, being able to return on average of 5 thousand records in this list. Where paging will display from 200 to 200 records. (I had no problems with paging, only with the time it takes to return records).

Still in WHERE, the fieldsativouser and ativosys are only to fetch records q are active, listed as a field of type enum which stores 0 for disabled and 1 for activated. q returns only the enabled companies in the list, but a super administrator will also have to list the disabled records which can further increase the number of displayed records.

All this is then stored in an Array where it is sent to another php area that displays the results within a template compiled by a Smarty system.

Someone who has worked with giant tables could help me with a better solution??

EDITED 01: As suggested by @bfavaretto, below is the result of EXPLAIN:

EXPLAIN

EDITED 02: I did some tests here, the query q this way with limit displaying from 200 in 200 records, it takes 7 seconds for each query.

I realized that the SQL_CALC_FOUND_ROWS weighs a lot in the query, it alone without the JOINS takes up to more than the query with the JOINS.

If I leave the JOINS but remove the SQL_CALC_FOUND_ROWS, from 7 seconds down to 4 or 5 seconds.

If I remove SQL_CALC_FOUND_ROWS and the JOINS the query goes to less than 1 second, which would be perfect.

The problem is that JOINS need to display the results on the screens. Now I need to think of a way to replace JOINS and display the same results.

  • 1

    Having 200,000 records is far from being gingante. But, Mysql has the Profiler and the EXPLAIN, they serve precisely to say what is the bottleneck and what is problematic... Moreover if the ativouser and the ativosys, useradmcod and the codequiperesp are on WHERE, and it’s direct comparison, you can create an Index with all of them. Dexes use RAM memory, if there is not enough will give in it. You can also create keys between the useradmcod of both banks to facilitate the JOIN, anyway, the prince is to know what is taking long. The fulltext only affects MATCH AGAINST, as far as I know.

  • usercod I imagine it’s primary key... with operator LIKE ? other than that, only create the indexes for the fields that are compared, both in Where and Join

  • 2

    Put a EXPLAIN before the SELECT of your query and edit the question with the result. You should have margin for optimization in your tables.

  • 1

    @Inkeliz So I thought 200 thousand records would be heavy, since the delay is being great, q would be more than 200 thousand companies, and more than 200 thousand users, where the company’s select is giving Join pro user, I believe this can slow tbm. I used EXPLAIN and did not understand anything about the returned result, Profiler tbm did not find on the net how to use. I created the Indice for the fields you suggested but it has not changed anything. He sang studying some way to avoid using Join to see if it gets faster.

  • 1

    @Rovannlinhalis, then the usercod is primary key in the users table, but in companies the usersfunccods will jam in a test field the code of all users q are employees of that company, then the search uses the LIKE to tbm find according to the logged in user if he is part of the employees of a certain company. I created the indexes for the fields that are in JOIN but nothing has changed at runtime. I think I must be doing it wrong.

  • @bfavaretto I edited and put the result of EXPLAIN in question. I do not understand how it works. I do not know what should be done with these results. I’ve never used this function before.

  • Have you put index in all columns? This is not recommended.

  • @Renatosilva I did not put in all columns no, I had 37 columns and 13 indices, and two were of the type Unique just to avoid equal data in the table, and I ended up leaving only 1 to reduce the number of indices. 6 were the fulltext type that I use in the search part of the frontend, I decrease to 2 indices where 1 of them contains 4 fields that are the fields for searching words. and 3 indices that total 6 more fields for joins. where when I do the listing I use the command USE INDEX to separate the indices used according to each query.

Show 3 more comments

5 answers

6

So the result of EXPLAIN shows that the query returns 60k+ rows of the enterprise table. With all those Joins, each of these lines will be combined with each row obtained from the other tables. Looks like your performance bottleneck’s in there.

What I would try, looking at this result, is to create some indexes. Note that none was used in the company table (column key vein NULL in the EXPLAIN). The first candidate would be an index in the column useradmcod, which is used both in the WHERE how much in the ORDER BY. The using filesort which came in explain is usually a cause of slowness, and has relationship with ordering on columns without index. If this is not enough, also put an index in the column you compare with LIKE, because this type of comparison is one of the slowest.

Be careful when creating indexes in your tables. Having no index other than Pks is usually a problem, but having many indexes as well, as they need to be rebuilt or adjusted in INSERT, UPDATE and DELETE operations.

  • 1

    So I don’t understand, because the companies table is the main table where the search is being made, and all the indexes that I created that are in the image of the question were created in this table companies, I don’t know anymore which indexes could be created here. I changed the image of the question with some new indexes I created, and still nothing changed in the speed of the query. I’m thinking of a way to reduce the number of JOINS and get the same result. LIKE i created in fulltext mode. About creating many indices, joining several fields in a single Indice can help??

  • 2

    There’s no point in creating numbers at random. Indexes over more than one column at the same time will only be used if the query has conditions and/or joins on top of all columns of the index. My suggestion was to create an index only in the useradmcod column and see how it affects the performance and outcome of explain. As I mentioned, it is not easy to find balance when creating indexes, each query query may need different indexes and each index has an impact on the database as a whole. Go slow and see the effect of each attempt, do not create several new indices at once.

  • 1

    Ahh ok I’ll try new tests then. I was researching about indices and saw that can be done a ignore in some indices in the query, I will test this way, so if it works out I can keep the indices in the fields that I will have to search, and test each Index as you suggested me.. I will try and then put here if it gave some better result.

3


After many hours of testing, and without more time to continue on the same problem, I solved the situation by modifying the table, the query and what will be returned on the screen by PHP.

Before in WHERE I had the fields ativouser and ativosys which were fields that determined when the company was activated or deactivated by the company owner or the system administrator. So I merged the two of them by just keeping the ativosys on the table.

I changed the field too usersfunccods for funcsenha, that instead of seeking the logged-in user within that field, I create a special password for employees.

I deleted two INDEX of the kind FULLTEXT that I decided not to put them more in the search.

On the listing screen I will no longer display the sectors so I eliminated all the JOIN that searched the names of the sectors by the codes.

And I will also no longer display the name of the team leader, displaying only the name of the company owner. So I eliminated one more JOIN and left only the most important.

So the modified table looked like this: Tabela alterada

Some INDICES will be used in other queries, so for this I am using the function: USE INDEX (codusers,ativadostatus) this way I use only the most important INDEX for this consultation.

So my new appointment was like this:

        $sql = 'SELECT SQL_CALC_FOUND_ROWS
            `empresas`.`nome`,
            `usuarios`.`nome` AS `usuarionome`,
            `empresas`.`plantype`,
            `empresas`.`datavencimento`,
            `empresas`.`datacadastro`,
            `empresas`.`ativoinfo`
          FROM
            `' . BDtables['BD_EMPRESAS'] . '` `empresas`
          USE INDEX (codusers,ativadostatus) 
            LEFT JOIN `' . BDtables['BD_USERS'] . '` AS `usuarios` ON `empresas`.`useradmcod` = `usuarios`.`cod`
          WHERE
            `empresas`.`ativosys` = "1" AND
            (`empresas`.`useradmcod` = "'.$usercod.'" OR `empresas`.`codequiperesp` = "'.$usercod.'")
           ORDER BY
            `empresas`.`useradmcod` LIMIT '.$inicio.', '.$ItensPorPagina.';';

Then the consultation time that before was in almost 7 seconds decreased to less than 1 second. What if I take the ORDER BY it gets even faster by decreasing to less than half a second.

It’s not the solution I wanted, but it will solve for a long time.

Thank you for everyone’s patience, even if I made all these changes, I learned many things from you that I didn’t know. Thanks anyway. Hugs.

2

200,000 are not many records, but you can try to break the result per page by limiting the logical id of the lines. Ie, return from 1 to 10, 11 to 20, 21 to 30....

To do this has the rownumber in SQL Server, limit/offset in Mysql.

  • In the select I put in the question there is already the limit for paging. Only that I return from 100 in 100, or from 200 in 200 records, pq the table is giving pagination with more than 4 thousand pages.

  • If I’m not mistaken, LIMIT is only executed at the end. So the effect of this on performance would only be on the amount of data that would need to be transferred between the mysql server and the application, not affect the performance of the query itself.

  • But if the amount of data is reduced, are less things to be in memory.

  • @Fernandovr, you can try to take your query and run it through the DBMS and see if the performance there was better. If the query has a better performance there, you can a function to return the result and the application call the same.

  • Truth @bfavaretto I did tests with smaller LIMIT, and only helped in the speed of the display on the screen with PHP, but in query time really did not change anything. E Melanie, I ran php tests and it took almost 10 seconds to return the result, Mysql front and Mysql Workbench took 7 seconds. Only by the Mysql terminal that had a better performance taking a little less than 2 seconds. Vc says do a function inside Mysql?? would you have any suggestions to pass me??

  • @Fernandovr has two options for Mysql... One function, if it is the return of a single data. In this link: https://ajuda.locaweb.com.br/pt-br/Como_cria_Functions_no_MySQL_5.0 Or a Procedure, if it is a list (works as a return from a table)... https://stackoverflow.com/questions/23421771/how-to-return-table-frommysql-function

  • Thanks @Melanieribeiro, I ended up solving in another way, and published the way I solved in a reply here: https://answall.com/a/234284/89580 . The functions I already knew, and I even thought about it, but it did not serve even to me, about Project I had not knowledge, but I will study tbm, who knows can serve me in future problems in this same project. Thanks for the tips ^^

  • @Fernandovr nothing! The changes were good! ^^

  • @Melanieribeiro obligatdoo I checked on procedures and found it very interesting. valeww ^^

Show 4 more comments

1

In addition to the suggestions of colleagues, there is a software that can assist in the analysis. The Neorprofile is free and helps me see all queries that pass to the database (like a tunneling). It tells the estimated execution time, the size of the return, and allows you to EXPLAIN the commands. It can be a good way to see the performance of the query and test changes.

  • 2

    Thanks for the suggestion, I will install this program and check its analysis. I hope it helps. If it gives any results I put here later.

-1

Old analyzing right there, has a lot of Join, something that is simple and already gives a help, research on HASH and MERGE JOIN, maybe already gives an improved performance.

  • 1

    This answer does not provide a solution to the problem. Please specify the parts with problem and propose a change.

  • 1

    I will search yes, thanks for the tip.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.