Linq’s witty comment (Jéf Bueno or whatever name he uses now) is true. A part of the problem is realizing the JOIN
where you don’t need it. It’s an automatic mechanism that can fetch things you don’t need.
But the reason people usually talk about him being problematic is that he has to look for information in different places. Having to build information is much worse than having it ready there for direct consumption.
Normalization
No one uses JOIN
because either, or at least it should be so. If it is necessary there is no big problem. It is necessary in banks normalized.
Not normalizing can cause problems, can work well in typical activities OLAP, but not in OLTP. A denormalization correct avoids the JOIN
. But so what?
In general build from various sources, besides having to read from various physical places, complicates database optimization, has more things to load in memory, overloads the cache, increases the use of internal structures to build the desired set, has more disk displacement (this is much improved with SSD and even more when it has RAM to keep the whole DB there, and soon will have the NVRAM), makes it difficult to take advantage of indexes and reset the query to have the best possible execution.
Merge data X JOIN
The JOIN
is basically a syntactic sugar, so it’s not the JOIN
which is bad, is your need. Do manually the same thing as a JOIN
well done does, gives in the same (or should). See:
SELECT * FROM a INNER JOIN b ON a.id = b.a_id
SELECT * FROM a, b WHERE a.id = b.a_id
I put in the Github for future reference.
Some Dbms can better optimize the first.
If you put it all together from the table a
and b
In one thing, the reading performance would be better. But it would probably make the writing worse, cause inconsistencies, unless it makes a complex mechanism. It could make reading worse in another scenario. Nosql typically opts for this.
Of course in some cases doing manually can become more obvious something that helps optimize more, but there is that thing, the mechanism is not to blame, it is the developer.
If you do several SELECT
s can be even worse depending on what you do. So in contrast to not using the JOIN
in standard tables, the JOIN
It’s usually great because it’s usually the same or better. You’re gathering data in the same way. And it may be that your RMDBS can’t do some optimizations because it doesn’t know that the intention is to do a join.
The problem with the phrase "JOIN
leaves the database slow" is that it is said as good practice, people do not explain why, and those who use the phrase has prejudice with the keyword and do not look at what really causes problem.
Having said all this, it is rare JOIN
, manual or automatic, well done, be a real problem. Not even normalization causes this whole problem, although it has a real cost.
whether it will maintain normalisation, and think that should keep, not fear the JOIN
, just use it the right way, optimize the tables to work well with them. See if coverage indexes can help, see if it has indexes that help the optimization work. See if you have any settings or ways to use JOIN
work better.
I was told that if I did it the way I created an ASP page and using several Selects I could "put my hand in the fire" that that result was consistent. As opposed to using a single query with 3 up to 4 JOINS
.
I don’t know what this means, but it could be what I said in the previous paragraphs.
Foreign keys
The use of FK is often overestimated even. In fact if used wrong causes tragedies in performance. It may be one of the responsible for the JOIN
(original, automatic) to be very slow, after all to meet the restriction that it imposes can end up consulting much more information than necessary. He doesn’t know if he needs to go through the entire information chain. Of course he can achieve greater consistency if he uses foreign keys.
Some people choose for views to abstract access to physical tables, thus creating a logical model that avoids you having to deal with such JOIN
s. This is not always a good idea, abstractions can hide important things and cause you to make mistakes so.
The more tables involved, the more it can harm performance, even exponentially. The more tables, the more possible combinations, the more data is generated, the more sources to be consulted, the less opportunities for optimization, or at least the more complicated it is to optimize.
What brings inconsistent results is the wrong use. There is no way the mechanism does this. A screwdriver does not screw the screw wrongly. Of course she’ll have a hard time putting a nail through the wall.
And not normalize usually give much more inconsistent results, and worse, that the programming can not work, everything is being registered wrong.
Today it is fashionable to talk about Nosql, even if they really want to talk about Norel. Avoid the JOIN
, But there’s a lot of consultation that’s tragic, and... inconsistent, by definition. In fact, the correct name for this type of technology should be Noconsistency because that’s what it doesn’t really have. More and more they are using relations and even SQL because they were wrong.
Today all major Sgdbs allow you to create a hybrid modeling that does more or less what Nosql does without his drawbacks.
What to do
I recommend starting to understand the whole functioning of the database, do tests, do not believe in absolute answers, ie do different than almost everyone does, which is to blindly believe the opinion of others who do not know their problem or who do not understand what they are talking about. Only you can find a solution to your problem. But just like people who work like you or who don’t seem to want to go deeper, you can only know what’s right by studying in depth, without skipping steps, without using superficial summaries. That attitude is missing in almost every IT market.
Usually it is the programmer.
– Jéf Bueno
Hearing from colleagues that queries followed without JOIN are more consistent than a single JOIN query is a first sign that you’re working with people who shouldn’t be programming.
– Oralista de Sistemas