What makes a JOIN bad in a database?

Asked

Viewed 948 times

8

Where I currently work I had to make some queries in the database to generate a report or take emails from students who did not do remakes in other courses.

However, I was given the idea of making these reports on an ASP page with tables using several SELECTs instead of a query only using the JOIN. And worse than the results (at least in number of lines found) had many differences between using the JOIN or several SELECTS.

I was told to do the way to create an ASP page and using several SELECTs "put your hand in the fire" that result was consistent. Otherwise use only one query with 3 up to 4 JOINS.

Detail: Here where I work they do not put, during the creation of the tables, the use of Foreign Keys. Just put a column with the name, for example, cod_aluno but without using the Foreign key connecting this column directly with the table Alunos.

Utilise JOINs in cases of 3 or 4 tables would be bad? What can lead to inconsistent results? What, through experience, would recommend to do so that I can generate or perform query operations without having to split into several querys.

  • 8

    Usually it is the programmer.

  • 6

    Hearing from colleagues that queries followed without JOIN are more consistent than a single JOIN query is a first sign that you’re working with people who shouldn’t be programming.

3 answers

13


Linq’s witty comment (Jéf Bueno or whatever name he uses now) is true. A part of the problem is realizing the JOIN where you don’t need it. It’s an automatic mechanism that can fetch things you don’t need.

But the reason people usually talk about him being problematic is that he has to look for information in different places. Having to build information is much worse than having it ready there for direct consumption.

Normalization

No one uses JOIN because either, or at least it should be so. If it is necessary there is no big problem. It is necessary in banks normalized.

Not normalizing can cause problems, can work well in typical activities OLAP, but not in OLTP. A denormalization correct avoids the JOIN. But so what?

In general build from various sources, besides having to read from various physical places, complicates database optimization, has more things to load in memory, overloads the cache, increases the use of internal structures to build the desired set, has more disk displacement (this is much improved with SSD and even more when it has RAM to keep the whole DB there, and soon will have the NVRAM), makes it difficult to take advantage of indexes and reset the query to have the best possible execution.

Merge data X JOIN

The JOIN is basically a syntactic sugar, so it’s not the JOIN which is bad, is your need. Do manually the same thing as a JOIN well done does, gives in the same (or should). See:

SELECT * FROM a INNER JOIN b ON a.id = b.a_id

SELECT * FROM a, b WHERE a.id = b.a_id

I put in the Github for future reference.

Some Dbms can better optimize the first.

If you put it all together from the table a and b In one thing, the reading performance would be better. But it would probably make the writing worse, cause inconsistencies, unless it makes a complex mechanism. It could make reading worse in another scenario. Nosql typically opts for this.

Of course in some cases doing manually can become more obvious something that helps optimize more, but there is that thing, the mechanism is not to blame, it is the developer.

If you do several SELECTs can be even worse depending on what you do. So in contrast to not using the JOIN in standard tables, the JOIN It’s usually great because it’s usually the same or better. You’re gathering data in the same way. And it may be that your RMDBS can’t do some optimizations because it doesn’t know that the intention is to do a join.

The problem with the phrase "JOIN leaves the database slow" is that it is said as good practice, people do not explain why, and those who use the phrase has prejudice with the keyword and do not look at what really causes problem.

Having said all this, it is rare JOIN, manual or automatic, well done, be a real problem. Not even normalization causes this whole problem, although it has a real cost.

whether it will maintain normalisation, and think that should keep, not fear the JOIN, just use it the right way, optimize the tables to work well with them. See if coverage indexes can help, see if it has indexes that help the optimization work. See if you have any settings or ways to use JOIN work better.

I was told that if I did it the way I created an ASP page and using several Selects I could "put my hand in the fire" that that result was consistent. As opposed to using a single query with 3 up to 4 JOINS.

I don’t know what this means, but it could be what I said in the previous paragraphs.

Foreign keys

The use of FK is often overestimated even. In fact if used wrong causes tragedies in performance. It may be one of the responsible for the JOIN (original, automatic) to be very slow, after all to meet the restriction that it imposes can end up consulting much more information than necessary. He doesn’t know if he needs to go through the entire information chain. Of course he can achieve greater consistency if he uses foreign keys.

Some people choose for views to abstract access to physical tables, thus creating a logical model that avoids you having to deal with such JOINs. This is not always a good idea, abstractions can hide important things and cause you to make mistakes so.

Performance and consistency

The more tables involved, the more it can harm performance, even exponentially. The more tables, the more possible combinations, the more data is generated, the more sources to be consulted, the less opportunities for optimization, or at least the more complicated it is to optimize.

What brings inconsistent results is the wrong use. There is no way the mechanism does this. A screwdriver does not screw the screw wrongly. Of course she’ll have a hard time putting a nail through the wall.

And not normalize usually give much more inconsistent results, and worse, that the programming can not work, everything is being registered wrong.

Today it is fashionable to talk about Nosql, even if they really want to talk about Norel. Avoid the JOIN, But there’s a lot of consultation that’s tragic, and... inconsistent, by definition. In fact, the correct name for this type of technology should be Noconsistency because that’s what it doesn’t really have. More and more they are using relations and even SQL because they were wrong.

Today all major Sgdbs allow you to create a hybrid modeling that does more or less what Nosql does without his drawbacks.

What to do

I recommend starting to understand the whole functioning of the database, do tests, do not believe in absolute answers, ie do different than almost everyone does, which is to blindly believe the opinion of others who do not know their problem or who do not understand what they are talking about. Only you can find a solution to your problem. But just like people who work like you or who don’t seem to want to go deeper, you can only know what’s right by studying in depth, without skipping steps, without using superficial summaries. That attitude is missing in almost every IT market.

  • I had written a long response about it, but it was kind of gross and I was embarrassed of my post after reading your :\

  • 1

    @Renan is not finished, need for a few more things, but since I’m running out of time, posted what gave. I like your answers, including the "rudeness" I live doing :D I think that shock can produce results in those who are open to change thinking. Of course, it doesn’t change, and it even drives away anyone who wants to see only what they want. Then I’ll read yours, but I know it’s technically perfect, I don’t think I should erase it, even before I see it.

  • @Renan got better, I took something of yours that is good. Maybe you want to recover by editing something that you think was exaggerated and ran too far from focus.

  • I think you’ve covered it all, +1.

  • @bigown forgive me for the delay of leaving a comment on your reply. She is excellent and has further opened my view on database matters. The databases I am handling do not have normalizations, data type specification (it is mostly varchar, no date, int, smallint and etc), do not link tables to each other and etc. I am always suggesting that in the large systems that there where I work they can begin to improve the structure, because certain works there inside are quite complex because of the lack of good use of the resources of a DBMS

  • 2

    If the database has no normalization no need to JOIN, but if this is true you have a huge problem at hand. To tell you the truth, I find it unlikely that there is no normalization. It may not have all possible. It is possible, but it is unlikely. DB based on string? I think the JOIN is the least of your problems.

  • Actually several tables are not normalized. Some "are" (at least until the second form) and others are not. I believe these are some of the reasons for the inconsistent results that the joins are bringing.

Show 2 more comments

5

Joins are not bad except in case the query generates a product (which is combination of all rows of a table with the result set or other table).

To solve the performance problem the first step is to see the plan of execution of the query that points out where the bottlenecks are. The simplest solutions are the creation of indices for columns used in clasps join, where order by (not a rule but a starting point), partitioning and even rewriting.

3

You should check if the database has indexes.

The issue of Fks is a problem, but for the consistency of the data. And thus avoid orphan records. It can be controlled via application, of course it can, but any bug can leak the problem.

I think your colleagues should check this out. And make some improvements to ensure a more consistent database, and prevent some problems.

Joins are great, there’s no reason not to use them. But indexes help a lot.

Creation of Views can also facilitate.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.