How to improve reading performance of a SQL Server database?

Asked

Viewed 17,379 times

25

I have a database with more than 250 tables, there are tables with more than millions of records and when I need to search or change some of these records, it takes a lot of time, which ends up leaving the experience very bad for the user.

The Customers table has over 50 million records. This table is in production 24 hours a day, averaging 200 darlings/min at certain times of the day. Whenever I need to look for a customer, there is a faster way, which is by looking for the table primary key, but I don’t always have the possibility to know the ID of a color client. In such cases, I use a partial name search (using the operator LIKE), that is, looking for Dar%, he returns me:

Darth Vader
Darcisio Araujo
Darlene Silva
[...]

Of course, besides only the name there are more filters that can be applied in the same query. Of the two forms presented, searching for the Name ends up taking a long time in certain cases.

  • What to do to optimize the speed of queries?
  • What usually slows down appointments? (in addition to having millions of records in the table, I believe that there are bad practices and implementation problems that make them even less performatic)

An observation: in my darlings, I usually filter columns like varchar(n) and bit. I also make intervals between dates with columns of the type date using operator BETWEEN.

  • 1

    A very important thing not informed, what the DBMS and the version used , some DBMS have solutions for the type you need to do. In addition, optimization solutions (tuning) change from BD to BD. However in general they go through : use of indexes , collected statistics , analyze the plan of execution of queries.

  • which database you use, Vinicius?

  • SQL Server @Marllonnasser

  • What version of SQL Server do you use? (beyond the year, state if it is standard, Enterprise, datacenter, etc....)

  • Search records using 'Like' or even use the index in the field (if it has). To properly use the text field index, switch to Starting with or relevant to your DBMS. But in this case, it will only return records that contain words started with the text you specify.

  • You have a maintenance routine for this bank, for example, compile indexes, update statistics?

Show 1 more comment

6 answers

22


What to do to optimize the speed of queries?

Well, the creation of right is fundamental in large and small size tables.

However there are cases that the index can be harmful. More details here

In your specific case, as you did not specify the columns want to filter I will assume that your problem is with the columns that are of type NVarChar.

For these specific columns, I advise the creation of the Full-Text index, that is nothing more than a "ready" solution for some banks, besides being much more effective and fast than the traditional LIKE. Besides restricting specifically to the columns you want to filter, which in my opinion, leaves your query much more "clean" visually.

In your particular case it would look something like:

ALTER TABLE clientes ADD FULLTEXT(nome, sobrenome, email, ....);

This way you ensure that columns "mapped" as FULLTEXT have a better indexing for texts compared to "traditional" index when used with operator LIKE.

And its query would look something like:

SELECT * FROM clientes WHERE MATCH (nome, sobrenome, email, ....) AGAINST ('dar');

Other examples of Full-Text

And for the other columns, I advise creating index for the columns you make join with other tables or that you filter.

What usually slows down appointments? (besides having millions of records in the table, I believe there are bad practices that leave them even less performative)

There are several reasons for this:

  • Table mapping that can be improved.
  • Lack of index in columns where crossings are made (join) with other tables.
  • Joins poorly done, usually the join for string may be a problem (depending on the bank).
  • Lack of periodic review of crucial system tables.
  • Depending on your table structure, suddenly a table partitioning can help and MUCH in performance issues.
  • 1

    ready, I added an excerpt advising you to create index in those columns.. :)

12

Since it is a very large comic book, the ideal is that you use more in-depth knowledge to make a more accurate analysis. The solutions presented here may help, but it will be a long shot without a optimization of queries. As you said your bank may have millions of records on the same table, you inevitably need a tuning and even make a bank partitioning. For this, I recommend that you seek technical expertise in the area or you can cause an even worse slowdown in your system. There are techniques such as heuristic and cost-based optimization, but this is work for a DBA. Sqlserver itself has an internal optimizer, but it is not as efficient as a handmade job.

9

create an index for this column seems to be the best solution. Depending on the database this may not be possible for text columns.

It is also possible to optimize the query by passing the joker character only on the right side. In your example would be:

(...) where nome like "dar%"

Another way would be to use some text search engine for this type of query, such as Solr.

  • I didn’t know Solr, thank you very much!

9

I’ll put an answer to complement,

Use SSD It is interesting to check the cache plan of the queries, to know if this query is cached or not. Otherwise I also recommend SSD, makes an absurd difference in reading the database when it has to be done reading the physical file. Apparently it’s a large database that sometimes makes it impossible to cache it in RAM.

-- View Adhocs (pre-stored queries when not stored)

use master
go
SELECT  cast(text as varchar(8000)) as Query,(cp.size_in_bytes/1024) as KB
    FROM sys.dm_exec_cached_plans AS cp 
    CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
WHERE cp.cacheobjtype = 'Compiled Plan' AND cp.objtype = 'Adhoc' AND cp.usecounts = 1 
order by KB desc

Check the consumption of IO

WITH Agg_IO_Stats
AS
(
  SELECT
    DB_NAME(database_id) AS database_name,
    CAST(SUM(num_of_bytes_read + num_of_bytes_written) / 1048576.
         AS DECIMAL(12, 2)) AS io_in_mb
  FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS DM_IO_Stats
  GROUP BY database_id
)
SELECT
  ROW_NUMBER() OVER(ORDER BY io_in_mb DESC) AS row_num,
  database_name,
  io_in_mb,
  CAST(io_in_mb / SUM(io_in_mb) OVER() * 100
       AS DECIMAL(5, 2)) AS Porcento
FROM Agg_IO_Stats
ORDER BY row_num;

Implementation plan

also check the SQL execution plan and look for indexes with bottlenecks. inserir a descrição da imagem aqui

In this case there is an index that is responsible for 100% of the consultation time. But in the case below there would not be much solution since "Clustered Index Seek" is the fastest type of index query.

Create new indexes

BEWARE, creating new indexes in varchar fields can cause an exponential increase in database size, especially in Databases with hundreds of thousands of records, just to give an example an index of mine in a field Varchar(100) occupies 5GB Another detail to consider is whether the table suffers a lot of Sert, because every index causes a certain slowness in Sert and updates. Defragment indexes

Check if the indexes exist if they are not fragmented, I usually use a script that defrags all indexes above 20% fragmentation. Learn more: https://msdn.microsoft.com/pt-br/library/ms189858.aspx?f=255&MSPPError=-2147217396 Learn more: http://www.fabriciolima.net/blog/2011/02/16/monitorando-a-fragmentacao-dos-indices/ Code I use http://pastebin.com/iaFbCik8

Move this query to a Storedprocedure

This performance gain is not absurd, but the stored Procedure is compiled and this brings several benefits as an internal sql cache plan.

Codes that might be interesting to you

--Qtd. of times a query has been executed

SELECT  text,plan_handle, cp.size_in_bytes,usecounts--,*            
FROM sys.dm_Exec_cached_plans AS cp         
      CROSS APPLY sys.dm_exec_sql_text(plan_handle)             
WHERE text not like '%dm_exec_sql_text%' --para não aparecer essa propria query         
    and text not like '%dm_Exec_cached_plans%' --para não aparecer essa propria query       
    and text like '%select%' -- aqui coloca o começo da sua consulta
ORDER BY usecounts DESC 

--Check missing index #### USE WITH EXTREME CARE!!!!!!

SELECT 
dm_mid.database_id AS DatabaseID,
dm_migs.avg_user_impact*(dm_migs.user_seeks+dm_migs.user_scans) Avg_Estimated_Impact,
dm_migs.last_user_seek AS Last_User_Seek,
OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) AS [TableName],
'CREATE NONCLUSTERED INDEX [SK01_'
 + OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) +']'+ 
 ' ON ' + dm_mid.statement+ ' (' + ISNULL (dm_mid.equality_columns,'')
+ CASE WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN ',' ELSE
'' END+ ISNULL (dm_mid.inequality_columns, '')
+ ')'+ ISNULL (' INCLUDE (' + dm_mid.included_columns + ')', '') AS Create_Statement,dm_migs.user_seeks,dm_migs.user_scans
FROM sys.dm_db_missing_index_groups dm_mig
INNER JOIN sys.dm_db_missing_index_group_stats dm_migs
ON dm_migs.group_handle = dm_mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details dm_mid
ON dm_mig.index_handle = dm_mid.index_handle
WHERE dm_mid.database_ID = DB_ID()
ORDER BY Avg_Estimated_Impact DESC
  • What is the danger of the latter?

  • leaving creating index as I said, can create bottlenecks, every time you do an Insert, delete or update the index needs to be updated, the problem of the above script is that it can make suggestions for clues with dozens of fields and an index of this can bring more harm than benefits.

4

By updating your database statistics, you can achieve improved performance as well. The first execution may take a while, but it’s worth it.

Just run the command below in a query SQL:

use NomeDoSeuDatabase

EXEC sp_updatestats
  • Be careful when executing certain commands, ideally you have a maintenance window so you can run safely.

2

I know little about the database, but I learned good development practices in college, and of course, the people here probably already know it so it’s nothing very special.

In the database modeling process: assembling relational diagrams, business rules is very important to normalize the tables. There are up to the 5th normal way if I’m not mistaken. Generally, even the 3rd normal form is enough, but if you want an even better performance, normalize up to the 5th. However, this practice can only be applied at the time of building the bank, so if your bank is already implemented and already has records, it is no longer possible. Another detail is that the number of tables usually increases considerably after normalization. As your bank has 250 tables, which is quite, I believe that the normalization was use, right?

For more details on standardisation: https://www.dcc.fc.up.pt/~ricroc/classes/0506/bd/notes/partVIII.pdf

Another important concept for good performance in the database system is related to how the application communicates with the bank and how transactions are handled.

An error I’ve seen in many examples I’ve found on the net of MVC-architecture database applications is the following: Some programmers have the bad habit of not closing the connection with the bank after they run the query in Daos class methods.

This is a serious mistake, as it can compromise the safety of the bank and also its performance as incredible as it sounds. When you do not close the connection with the bank after executing a query by the application, the DBMS ends up having to do this alone to avoid conflicts of transactions applied in the same record and thus maintain the integrity of the data. However, the DBMS only closes this connection if there are two transactions trying to access the same record at the same time. While this doesn’t happen, the connection stays open, and of course, it "consumes" the bank’s performance. If it’s a record, two or three doesn’t influence it much, but what if it’s a bank with millions of records, where the flow of transactions is usually higher? A lot of banks get slow with time because of that. Hackers may also have greater ease in accessing bank information if there are long open connections, as it may use some means to capture the tuple recovered by the query on the open connection.

Overall, that’s all I had to say about it. I hope I helped. If I’ve made a mistake in my answer, please don’t stop giving me feedback.

  • 3

    Are colleges now teaching good practice? Good times when they taught concepts, fundamentals... ;) I don’t understand what security problem there can be if you don’t close the database, although I agree that this is usually a good idea. But because some people follow rules and don’t understand how the mechanism works, I also see people opening and closing connections over and over again, when there’s no need.It seems to me that data integrity is guaranteed regardless of the closure of the connection,after all we are talking about warranties and so can not be conditional on correct use

  • 3

    Not everything will fit here, but seems to have enough assumptions in the answer (in general undue), including nothing in it will make the performance improve, which is what was asked.

  • 5

    @Cristiano I think it’s cool the intention of you to share these ideas, but it’s very important in the next answers to try to give solution that goes straight to the point. It can even enrich and illustrate points with ancillary information, but it is indispensable to maintain the "axis" of the answer in what was asked.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.