Filters or Sort, which should be the first when creating an index in the database

Asked

Viewed 511 times

7

Given, for example, the following Query:

SELECT 
  ClienteId,
  Nome,
  DataNascimento,
  Cidade,
  Estado,
  DataCadastro
FROM
  Cliente
WHERE
  Estado = :Estado AND
  Cidade = :Cidade
ORDER BY
  DataCadastro, DataNascimento DESC

in terms of better use of the index and performance, my index should first consider ordering and then filter.

CREATE INDEX TESTE1 (DataCadastro ASC, DataNascimento DESC, Estado, Cidade)

Or first the filter and then the sort?

CREATE INDEX TESTE2 (Estado ASC, Cidade ASC, DataCadastro ASC, DataNascimento DESC

2 answers

4


I view index creation as something that requires a "trial and error".
However there are basic rules to follow:

1 - Indexes do not add performance in small tables.
2 - Many indexes may decrease performance in INSERT, UPDATE and DELETE
3 - The indexes should contain few columns.
4 - The columns used in the clauses WHERE and BETWEEN or participating in a JOIN should be placed first. The remaining columns should be organised on the basis of their level of distinction.

In view of this the answer to your question is:

CREATE INDEX TESTE2 (Estado ASC, Cidade ASC, DataCadastro ASC, DataNascimento DESC)  

However, if the clause ORDER BY Datacadastro, Datanascimento DESC is often used and prevails upon ordering by primary key , the creation of a CLUSTERED INDEX in those columns.

Then two would be created Indexes:

CREATE INDEX index1 (Estado ASC, Cidade ASC)
CREATE CLUSTERED INDEX index2 (DataCadastro ASC, DataNascimento DESC)  

The data would be recorded ordered by DataCadastro ASC, DataNascimento DESC instead of by primary key.

UPDATE
The exchange of comments between me and jean, led me to research more on the subject.

The use of a CLUSTERED INDEX in a field other than "ever-growing" may give rise to times of INSERT higher.
However, ordinations are always longer than a INSERT.
The cost/benefit ratio should be assessed.

In the case of the question, we can ensure that the index is "ever-growing" if DataCadastro be the type Datetime.

If you want to become one expert in the creation of Indexes follow this link.
See also this explanation of how the CLUSTERED INDEX works

  • downvote because of the misuse of the clustered index. See my answer for explanation (the other answer is correct).

  • 1

    @jean You are right. In principle this type of index should be created with the clause UNIQUE, however, this is not a problem if the set of columns represents a large number of unique records. This is what happens in this case, it will not be easy to have situations where on the same day you register more than one person with the same date of birth.

  • The problem is not one of uniqueness. Imagine that we have a table of people with 100 million records. Inserting a person of average age will cause the table to be split in half so that the record is inserted in the "correct" position. Now imagine the overhead of moving 50 million records from one position within pages

  • @jean I can find nothing in the documentation that refers to this situation. Follow this link and see the section on Column Considerations where it is suggested the creation of a CLUSTERED INDEX in the fields Lastname, Firstname, Middlename this suggestion is justified on the same grounds as I gave in my reply.

  • 1

    You really won’t find this level of detail in msdn but check this out: Columns that undergo Frequent changes This causes in the Whole Row to move, because the Database Engine must Keep the data values of a Row in Physical order

  • Although this refers to columns that have their value edited something similar occurs in insertion, where inserting will give this kind of problem. You will find lots of references to this more specialized sites for each DBMS, in the case of MSSQL suggest the SQL Server Central

  • @So I don’t understand why you can create yourself CLUSTERED INDEX’s. Since there can only be one per table this would have to be the primary key, by default this is what happens. But, to agree with what you say, the primary key I’d always have to be the AUTOINCREMENT to ensure that it is sequential.

  • Do not confuse Logical Order with Physical Order. When I said that data is recorded in that order I did not mean physically. CLUSTERED INDEX just forces the logical order. This is achieved by the way the b-Tree is created, which is different from b-Tree of UNCLUSTERED INDEX’s

  • I refer physically (within the logic of B-Tree). DBMS lets you create the indexes as you want. I’ve seen large production bases without qq FK and with clustered Indexes in GUID columns. Not necessarily autoincrement but it is the ideal yes.

Show 4 more comments

2

Index creation order does not change the use of the index.

As for leaving an index covering several columns I suggest you create two, one for the filters and the other for sorting because the DBMS engine uses the filters and does the sorting at different stages of the processing.

What indices to create...

It depends on the use you will make of the table, the table size of the amount of times/time you will scroll through the columns, make queries, Inserts, etc.

I advise to make trial and error and study the tools that your DBMS has like profiles, Wizards, query plans, etc.

Indexes d+ get in the way, wrong indexes idem. Read a lot because this is a very complex subject to fit in an answer. If you want a more detailed analysis create a more specific question, but then you will need to post more details, such as object creation queries and the most frequent queries being used, query plans, etc.

As for the clustered indexes mentioned by @ramaral it is a bad practice to use them with fields "Random"*.

*Random: In this case I mean that are values not sequential, can be a date of birth or a guid. The problem is that as the table will be kept ordered physically by this column(s) (s) when you enter a new value it will force a table reordering (causing page splits, etc.). This creates an immense overhead plus that potentially much of the table will get locked during the operation of Insert (due to page splits). So only create clustered indices in sequential fields (Obs, Pks in general are clustered indices)

  • In this case the set of columns DataCadastro and DataNascimento, practically, it can be considered as unique, which will not cause the effect that you refer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.