What good is a "self Join"?

Asked

Viewed 3,726 times

11

I would like an example for me to understand why I use a self relationship on the same table?

I have intermediate knowledge in SQL but still do not understand Self Join.

3 answers

8

Trees can be represented in a single table. For example:

eletrônicos
    áudio e vídeo
        TVs
            HD
            Full HD
            4k
        Home Theaters
        ...
    informática
    ...
...

Table-shaped:

id       parent_id     nome
----------------------------------------
1        NULL          eletrônicos
2        1             áudio e vídeo
3        2             TVs
4        3             HD
5        3             Full HD
6        3             4k
7        2             Home Theaters
8        1             informática  

A query that starts from any node in the tree can use self joins to climb the ladder:

SELECT
    categorias.nome AS categoria,
    pai.nome AS pai,
    avo.nome AS avo
FROM categorias
    LEFT OUTER JOIN categorias pai
    ON pai.id = categorias.parent_id
    LEFT OUTER JOIN categorias avo
    ON avo.id = pai.parent_id
WHERE categorias.id = 6 -- 4k

Upshot:

categoria   pai    avo
---------------------------------
4k          TVs    áudio e vídeo

Like I said at the beginning, trees can be represented in this structure, but the best representation may be another depending on how you need to use the data. For more details about tree structures in databases, see How to model a tree data structure using a relational database?

7


A use of the self Join that I consider "practical" (since its use to represent trees can be problematic) is to do competition control. As I detailed in another answer, Suppose you have a die that could be the subject of several simultaneous scripts, but it is important that only one of them succeed (example: buying the last item of stock). Transactions only will not prevent a race condition (i.e. one overwrites the other’s data), and pessimistic lock can degrade performance.

A solution then is to use a "stock chain": represents the current stock by a row in the table, and when the stock changes instead of changing the row creates a new one - with reference to the previous row:

create table estoque(
    id         integer   not null,
    valor      integer   not null,
    anterior   integer   null        unique,
    primary key (id),
    foreign key (anterior) references estoque(id)
);

This way, you can make parallel transactions at will, because if you have the bad luck of two trying to change the stock at the same time, the second one will always fail - because the new line added will violate the restriction UNIQUE.

And where does the self Join enter? When searching for what is the current stock:

select e1.id, e1.valor
from estoque e1
  left outer join estoque e2 on e2.anterior = e1.id
where e2.id is null;

i.e. "select the row that is not the previous of any other row".


Updating: the answer of Caffé in a related question shows a better way to control the competition. However, there are still cases where this "chain of records" can be useful - for example, when you want to store not only the most recent record but also the history of past values (while controlling the competition). An example would be content like wiki, where two or more people can try to make an issue at the same time. A chain of revisions not only ensures that editing one does not overwrite the other, but also keeps all edits in history.

  • 2

    A simple technique to ensure integrity is to update the balance table first, then the record will be locked and the concurrent transaction will only wait instead of the user getting an error. I suggested this in this answer: http://answall.com/questions/42130/banco-data

0

Self Join is used when a table references its own data. Like that:

SELECT * 
FROM Table t1, Table t2
WHERE t1.Id = t2.ID

Browser other questions tagged

You are not signed in. Login or sign up in order to post.