In general
- maybe, depends
- maybe, depends
- maybe, depends
A little less general
INDEXES
improve searches with specific values in indexed columns (including joins) but incur overhead in INSERT|UPDATE
, vine What are the advantages and disadvantages of using indexes in databases?.
FOREIGN KEYS
will negatively affect time of INSERT|UPDATE
, which is logical since having to guarantee referential integrity involves extra checks (think about how you would implement this mechanism and you will see that there is no way out of an extra cost). They can also accelerate operationsSELECT
, however how and why it will depend on the database being used.
The "negative" impact of FKs
and INDEXES
in inserts/updates is clearly document in some banks, ex: Postgresql recommends removing Dexes and Foreign Keys in Bulk Inserts, the same for Mysql.
The positive impact of FKs
in SELECTS
It’s kind of obscure, I’ve never heard of it and I haven’t found any explicit documentation about it, but there are reports on the net and even an example below showing how it actually happens.
If need be
Use EXPLAIN
: Postgresql, Mysql, Sqlite, Oracle, SQL Server.
Let’s take your example and do a shallow analysis of it on MySQL
:
CREATE TABLE niveis (
id INT,
nome VARCHAR(255)
);
CREATE TABLE usuarios (
id INT,
nivel_id INT
);
No Indexes and fks with query
EXPLAIN SELECT A.*, B.nome FROM usuarios AS A JOIN niveis AS B ON B.id = A.nivel_id \G
we have:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: A
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: B
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using join buffer (Block Nested Loop)
Let’s focus on the column type
with values ALL
and ALL
. What that indicates is that all the rows of the table A
will be scanned and for each of them we will scan all the rows of the table B
, the equivalent of the following pseudo-code:
para id in A:
para id_2 in B:
if id == id_2:
print(id, id_2)
An operation O(n^2)
.
Now using INDEXES
(PRIMARY KEY
generates UNIQUE CLUSTERED INDEXES
in MySQL
):
ALTER TABLE niveis ADD PRIMARY KEY (id);
ALTER TABLE usuarios ADD PRIMARY KEY (id);
mysql> EXPLAIN SELECT A.*, B.nome FROM usuarios AS A JOIN niveis AS B ON B.id = A.nivel_id \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: A
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: B
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.A.nivel_id
rows: 1
filtered: 100.00
Extra: NULL
Again we have ALL
for A
, but now with eq_ref
for B
. ALL
will traverse all lines of A
and eq_ref
will use the index
and pick up only the necessary lines, equivalent to:
para id in A:
if B possui id:
print(id, id)
O(n)
.
At last we will analyze with the FOREIGN KEY
:
ALTER TABLE usuarios ADD FOREIGN KEY (nivel_id) REFERENCES niveis(id);
mysql> EXPLAIN SELECT A.*, B.nome FROM usuarios AS A JOIN niveis AS B ON B.id = A.nivel_id \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: A
partitions: NULL
type: index
possible_keys: nivel_id
key: nivel_id
key_len: 5
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: B
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.A.nivel_id
rows: 1
filtered: 100.00
Extra: NULL
index
+ eq_ref
. According to the documentation index
is "practically" ALL
, but usually faster because it does the complete search in the index structure instead of directly in the table. That is, still O(n)
, but faster than ALL
+ eq_ref
.
Finally, it is important to make it clear that EXPLAIN
vary, and greatly, according to data and quantity of them, so it is important not to get stuck the first analysis because it will need to be repeated as your database grows and the indexing strategy can change several times (this does not mean that the EXPLAIN
is useless in an empty database, the bank will not do magic to turn your inefficient queries into something decent, a ALL
, ALL
for example is almost always a bad sign).
Commentary
First if you are not sure that this will cause a noticeably negative impact on your application (almost never will), use Foreign Keys. Reasons:
- The database is optimized to handle this, trying to replicate these checks in your application will definitely be slower.
- The great strength of
SQL
is to be a declarative language, you do not specify "how" to do, just "what". This goes for referential integrity, you declare which relationships exist without having to implement them in code. Taking the idea to your code you would have to do both, declare relationships and implement the checks, not very productive.
- If your model is great, if your app is complicated and you don’t have a long beard and call yourself Ken Thompson (or one of the other bearded antediluvians) you will generate incoherence in your data if you try to keep them consistent by transforming everything in the application. It is a fact of life, accept.
About the INDEXES
does not have cake recipe, you will have to evaluate the general use of your application and what types of queries
are made to be able to determine when to use them. Yes, EXPLAIN
is complicated and tedious, but you do not need to use at all times, only when you are in doubt about the behavior of RDBMS
in queries
complex, there is no escape.
Foreign Keys happens just the opposite, as is a Constraint, if there is not, the table maintenance is faster, already indexes for maintenance also, but at the time of the queries, they are needed, damaging the performance
– Ricardo Pontual