Dropping indexes or Foreign Keys in a table can slow the query?

Asked

Viewed 2,051 times

15

What I always read some tutorials talking about Mysql and I’ve also heard from some fellow programmers is that the lack of indexes or foreign keys in a table can make a query slow.

For example, I have a system where the table usuarios have a field nivel_id to reference the id of niveis. But I didn’t put nivel_id as Foreign key of niveis.id, I just define her as INT UNSIGNED.

About that I have some doubts:

1 - It is true even that this can cause a query to become slow?

2 - If the answer to the above question is "Yes", I would like to know why to slow down.

3 - The lack of indexes or Foreign Keys can slow down a query, in a case of a JOIN for example?

Example for question 3:

SELECT A.*, B.nome FROM usuarios
JOIN niveis AS B ON B.id = A.nivel_id

If the above query returned about 10,000 lines, for example, it would be faster using FOREIGN KEY or INDEX?

I’m asking this because I’ve had to maintain some systems that they haven’t FOREIGN KEYS in some records that should be related and, in addition to making some data inconsistent, I wanted to know if I run the risk of having a slowness presented on account of that.

  • 1

    Foreign Keys happens just the opposite, as is a Constraint, if there is not, the table maintenance is faster, already indexes for maintenance also, but at the time of the queries, they are needed, damaging the performance

4 answers

14


First let’s separate things. Creating index and placing a foreign key constraint are different things and not directly related.

It is true even that this can cause a query to become slow?

Failing to index is almost certain that yes, in most cases, although the system may be intelligent at certain times and be able to do it in a reasonable way, or else it may all work out the same, but it is rare. See more in What are the advantages and disadvantages of using indexes in databases?. SGDB can create a temporary index in the absence of one. If this always occurs it is because index is missing.

But there is a case that too much content can make writing slower.

Slow is a very broad concept. It has several forms of slowness, some more important, others less.

The foreign key does not affect anything as far as is generally known, but nothing prevents some database to invent some crazy.

If the answer to the above question is "Yes", I would like to know why to slow down.

Having a foreign key directly does not cause or stop causing problems. In fact its existence in some cases can cause slowness because it may force an unnecessary check at some point, but nothing critical. The fact is that with or without it is necessary to be careful and know what you are doing, see if the planning of the consultation is adequate. For having something extra to do the trend is more damaging than helping the performance. And if you have the foreign key and there’s no index there it’s almost certain to be worse, but it’s more by the index.

If the above query returned about 10,000 lines, for example, it would be faster using FOREIGN KEY or INDEX?

With content, certainly yes, provided that created properly, of course. With foreign key no, it is only a constraint, not a form of optimization, the query is based on itself, no matter how the foreign key was defined, it is a tool of integrity and not of access.

When it takes everything and has no specific order the need for index is lower or nonexistent, at least for the table usuarios there.

Already niveis without content can be tragic. But it may not be so much, it may have few levels to pick up, it may be that you cache everything and get fast. The question does not say how many levels there are.

Use of the EXPLAIN

It may be that one day is slow, another less according to the use, it may be that a new version of Mysql improves, IE, is detail. Cache counts a lot in those hours too. That’s why EXPLAIN should be used with care, people think that every day it will give the same result, this is not true, so it is silly to see an example of it for the general understanding. What comes at a time, in a specific case, in the same table, in the same query, may be different in another case of use. The EXPLAIN Not knowing the general setup, not knowing what data you have there, not knowing the statistics stored just fools you, so there’s nothing accurate about it. And if you do not know how to interpret the result can cause more damage.

Consistency

This story of not having FOREIGN KEY and inconsistency walking together is only true when you let the database take care of you, if your application care it becomes unnecessary. Some people don’t use FOREIGN KEY never or almost never, and it is someone who likes performance, as you should know about this friend :). Knowing how to do the best performance, a shame that not everyone knows, so there’s a lot of bad software out there.

Performance

The secret of performance is the right basic modeling designed for performance and proper use of index. Avoid:

  • have to access more than one table in the query
  • access too much and unnecessary things in most queries
  • and directly access the table when it needs access different from its basic structure.

Behold How to apply Dexes to improve the performance of queries?. And also Can subqueries decrease performance? Myth or truth? since JOIN is still a subquery.

  • 3

    "Some people never or almost never use FOREIGN KEY", "Avoid having to access more than one table in the query"... I swear I couldn’t see a bank for a simple "no FK" system. The little I know about FK, is that it serves mainly to avoid repeat records, saving space, and even giving efficiency in the table when no need query with JOINS. So my question: When you quoted these parts that I put in, it means there’s no FK, but there’s a reference between the tables, right? And that’s what the application is all about!?

  • 2

    FK has nothing to do with repetition, space-saving and efficiency, nothing to do with those things, it has to do with integrity, which can be done by application. And I can’t imagine why FK would eliminate the use of JOIN, at most eliminates the need to write them, which can be worse (I said this in the answer without giving details) because if he does JOIN automatic and no need is waste. The end is correct, that’s it. I use database to store data, period! Ñ use to be part of my application.

  • I think until today I assimilated FK with these points I said, but I had never assimilated that FK would be basically a "lock" of integrity, and pretty much just that, right!? As for JOIN I expressed myself thinking and writing missing: FK would not eliminate JOIN but rather the "relationship" that it "hangs", confusing with "not having relationship", without a related table...

  • 1

    Yes, only integrity. The relationship you do as you please, the FK is a form and little flexible, which can be good,.

  • 2

    As @Maniero said, pk has nothing to do with duplicate records, she’s a Constraint, and as such has the objective to apply some restriction in a column, which in this case is "validate the existence of a value in another table", or depending on what is added in the foreing key, "Delete in cascade" or "set nulls in exclusion". About Join, it is not necessary to have a PK to make a Join, it can be done with any field, independent of PK/FK, no from (with Join) or in where

  • 2

    @Rbz has a document about high performance websites, I’ll try to find here, which mentions a test with a bank with millions of hits per hour, with and without FK, the difference is relevant, little per transaction, but multiplied by millions, makes a site a few seconds faster. Of course in this case the validations should be made in the code, but it is not something abnormal even for a relational database

  • @Ricardopontual Opa, would love to see this article! ... " Updating the concept of FK" after that, I was thinking: really would have no problem not having FK, but the application has to be well done. In this, you could do all the application with FK, then remove it! rs

  • 1

    This generates a philosophical discussion: "who is responsible for ensuring data integrity?", surely a dba will say q is from the database, but the rest of the data? who validates inject script for example is not the application, among other things, before arriving at the bank? Interesting to think :)

  • Dude, make a JOIN of a 10 mi table of records with another 10 mi of records. Yes makes a difference and a lot! Now with 10k... to another with 1/2 duzia should not make a difference to the user

Show 4 more comments

7

In general

  1. maybe, depends
  2. maybe, depends
  3. maybe, depends

A little less general

INDEXES improve searches with specific values in indexed columns (including joins) but incur overhead in INSERT|UPDATE, vine What are the advantages and disadvantages of using indexes in databases?.

FOREIGN KEYS will negatively affect time of INSERT|UPDATE, which is logical since having to guarantee referential integrity involves extra checks (think about how you would implement this mechanism and you will see that there is no way out of an extra cost). They can also accelerate operationsSELECT, however how and why it will depend on the database being used.

The "negative" impact of FKs and INDEXES in inserts/updates is clearly document in some banks, ex: Postgresql recommends removing Dexes and Foreign Keys in Bulk Inserts, the same for Mysql.

The positive impact of FKs in SELECTS It’s kind of obscure, I’ve never heard of it and I haven’t found any explicit documentation about it, but there are reports on the net and even an example below showing how it actually happens.

If need be

Use EXPLAIN: Postgresql, Mysql, Sqlite, Oracle, SQL Server.

Let’s take your example and do a shallow analysis of it on MySQL:

CREATE TABLE niveis (
    id INT,
    nome VARCHAR(255)
);
CREATE TABLE usuarios (
    id INT,
    nivel_id INT
);

No Indexes and fks with query

EXPLAIN SELECT A.*, B.nome FROM usuarios AS A JOIN niveis AS B ON B.id = A.nivel_id \G

we have:

*************************** 1. row ***************************
       id: 1
select_type: SIMPLE
    table: A
partitions: NULL
     type: ALL
possible_keys: NULL
      key: NULL
  key_len: NULL
      ref: NULL
     rows: 1
 filtered: 100.00
    Extra: NULL
*************************** 2. row ***************************
       id: 1
select_type: SIMPLE
    table: B
partitions: NULL
     type: ALL
possible_keys: NULL
      key: NULL
  key_len: NULL
      ref: NULL
     rows: 1
 filtered: 100.00
    Extra: Using where; Using join buffer (Block Nested Loop)

Let’s focus on the column type with values ALL and ALL. What that indicates is that all the rows of the table A will be scanned and for each of them we will scan all the rows of the table B, the equivalent of the following pseudo-code:

para id in A:
    para id_2 in B:
        if id == id_2:
            print(id, id_2)

An operation O(n^2).

Now using INDEXES (PRIMARY KEY generates UNIQUE CLUSTERED INDEXES in MySQL):

ALTER TABLE niveis ADD PRIMARY KEY (id);
ALTER TABLE usuarios ADD PRIMARY KEY (id);

mysql> EXPLAIN SELECT A.*, B.nome FROM usuarios AS A JOIN niveis AS B ON B.id = A.nivel_id \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.A.nivel_id
         rows: 1
     filtered: 100.00
        Extra: NULL

Again we have ALL for A, but now with eq_ref for B. ALL will traverse all lines of A and eq_ref will use the index and pick up only the necessary lines, equivalent to:

para id in A:
    if B possui id:
        print(id, id)

O(n).

At last we will analyze with the FOREIGN KEY:

ALTER TABLE usuarios ADD FOREIGN KEY (nivel_id) REFERENCES niveis(id);

mysql> EXPLAIN SELECT A.*, B.nome FROM usuarios AS A JOIN niveis AS B ON B.id = A.nivel_id \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
   partitions: NULL
         type: index
possible_keys: nivel_id
          key: nivel_id
      key_len: 5
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.A.nivel_id
         rows: 1
     filtered: 100.00
        Extra: NULL

index + eq_ref. According to the documentation index is "practically" ALL, but usually faster because it does the complete search in the index structure instead of directly in the table. That is, still O(n), but faster than ALL + eq_ref.

Finally, it is important to make it clear that EXPLAIN vary, and greatly, according to data and quantity of them, so it is important not to get stuck the first analysis because it will need to be repeated as your database grows and the indexing strategy can change several times (this does not mean that the EXPLAIN is useless in an empty database, the bank will not do magic to turn your inefficient queries into something decent, a ALL, ALL for example is almost always a bad sign).

Commentary

First if you are not sure that this will cause a noticeably negative impact on your application (almost never will), use Foreign Keys. Reasons:

  • The database is optimized to handle this, trying to replicate these checks in your application will definitely be slower.
  • The great strength of SQL is to be a declarative language, you do not specify "how" to do, just "what". This goes for referential integrity, you declare which relationships exist without having to implement them in code. Taking the idea to your code you would have to do both, declare relationships and implement the checks, not very productive.
  • If your model is great, if your app is complicated and you don’t have a long beard and call yourself Ken Thompson (or one of the other bearded antediluvians) you will generate incoherence in your data if you try to keep them consistent by transforming everything in the application. It is a fact of life, accept.

About the INDEXES does not have cake recipe, you will have to evaluate the general use of your application and what types of queries are made to be able to determine when to use them. Yes, EXPLAIN is complicated and tedious, but you do not need to use at all times, only when you are in doubt about the behavior of RDBMS in queries complex, there is no escape.

2

My friend, I will summarize what in knowledge about this.

Unnecessary FK causes the bank to perform unnecessary queries increasing the cost. If your model needs FK, implement them.

About the indexes: almost all Dbms create indexes behind the scenes, but putting them explicitly is perhaps the best way. This makes you have increased performance and in some queries can reduce the cost grandiosely.

2

Various factors can slow research:

Select * from usarios

is slower than

Select id,usario,senha from usarios

why ? in the first example I forced the database to search and sort all columns of the user table. Of course in small tables the time is imperceptible, but imagine a table that has a relatively high number of columns, the performance of the query would be affected.

Indices are automatically created in the smartest Dbms but not all are optimized for this type of service. Creating an Intel is a way to relieve bank stress.

For example I will show when creating an index by name.

inserir a descrição da imagem aqui

Note that in the Index ID column, it is the normal behavior of the bank.

When I created an index for the name, it ordered that way under the cloths "Note the Information by Name".

If I run a query searching for all the names that start with L, in which index will it be faster? right if said in the name index, because the bank will go straight to that part where all names start with L.

About FKS, as stated above if the bank does not need to implement them, FKS unnecessaries are equivalent to unnecessary searches.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.