Performatively speaking, is it good to use Self Join?

Asked

Viewed 287 times

4

Regarding this question: What good is a "self Join", technically speaking it’s performative to do this, or using 1N, 2N and 3N (normal shapes) is the best way?

Just remembering that I never claimed that one breaks another’s rule, it’s purely about performance between the two.

  • 2

    At least in the times I used (common in accounting, when an account references your mother account) it worked very well both in SQL and in those damn 4GLs that were fashionable in the 80/90’s. Self-join has always been considered an elegant solution.

1 answer

5

I do not think there is any difference in performance between making a junction of one table with another, or a table with itself. I have no data to support this answer, except the absence of evidence to the contrary (I could do benchmarks to prove this - in different Dbms - but from the logical point of view it is the same thing one or the other).

(lie: there is the possibility that they are even more efficient that external joins - since there is only a set of meta-data, indexes and data itself to store in memory during a query. But this is speculation, only benchmarks could prove this in fact.)

If you mean implementing a tree structure using self Join, it’s not a good idea, but not because of the performance issue - but because it complicates certain queries too much (particularly if the tree is deep). But the queries you can do, should be as efficient as queries with joins to other tables.

But other uses should be ok. I just added a new answer to the linked question, giving another example of practical use of self Join (in this case, a left Outer Join). Unfortunately I can not talk much about his performance, because although I have implemented it in a real system, this has never had a large volume of data or accesses, so I do not know what will be his performance when/if it reaches large scale.

P.S. By the way, there is nothing in the self joins that violates in itself the normal forms. You might as well have a standard model with references from a table for yourself.

  • 3

    Yes, I also wondered about the question of normal forms, not because a self-join could not/should be considered normalization. And on the trees, it’s extremely common to model trees into a single table. That’s what I see most. In many cases, one level is sought at a time, so performance is not a problem. However, if the intention is to list all nodes accompanied by their complete hierarchy, it really complicates things. As I said there in the other post, the best structure depends on the purpose. And sometimes the solution may be to combine more than one structure.

  • Maybe I didn’t explain but I didn’t say that one breaks the other I meant performatically, which is better! I once joined a company and it was all so self Join system was too slow

  • 1

    @bfavaretto I agree, both that said "...the queries that you can do, should be as efficient as...". I have little practical experience, particularly involving third-party code. But I don’t doubt that this will be used successfully in real systems - sometimes the theoretical "purists" exaggerate a little the concerns...

  • 1

    @Harrypotter Then I don’t know what to say, because for me one thing has nothing to do with the other... (i.e. we would be comparing apples to oranges) What I can say is that sometimes normalization worsening performance - one of the valid reasons for denormalizing a model is when performance is bad. But if the system you worked with was denormalized and slow, then it should have other problems more serious than simply using or not the X or Y feature.

  • Okay @mgibsombr ...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.