Query in two tables without INNER JOIN

Question

Query in two tables without INNER JOIN

Asked 11 years, 5 months ago

Viewed 9,148 times

10

I would like to know, is there a difference in the performance of these two queries? in Mysql

SELECT employee.*, company.name  FROM 
company,
employee
WHERE
employee.company_id = company.id AND
company.id = '365'

or

SELECT employee.*, company.name  FROM 
company 
INNER JOIN employee ON employee.company_id  = company.id
WHERE
company.id = '365'

2

Performace in which database? the first query is used in old databases that do not support syntax/standardization ANSI92(join).

– rray

2014/03/26 at 18:46
In Mysql Database

– Tuyoshi Vinicius

2014/03/26 at 18:47
1

Okay, edit the question and make it clear that it’s on mysql, pq tags are with other banks.

– rray

2014/03/26 at 18:49
I didn’t think it would make a difference in other banks.

– Tuyoshi Vinicius

2014/03/26 at 18:52
The syntax may be the same but the internal search engines of the databases are different or different response times.

– rray

2014/03/26 at 19:05

3 answers

4

This answer may not be correct for this case, but I think it valid to share knowledge.

What is my knowledge and experience, using the INNER JOIN in fact you are tying the search and filter method that the bank will use between the tables.
In a way it’s like imposing on the BD: first you get the "father" record and then the "son record".
If you are sure of the relationship , knowledge of the content of the data and that this content is not dynamic enough to affect the way you set up your relationship, the INNER JOIN will always perform well.

However.... not everything is perfect and it is common to see monstrous or "analytical" Sqls under data, relatively complex relationships and table data that change with time (I explain below)

When using the "non-aansi" pattern (without the JOIN clause) you leave open so that the optimizer of the engine choose the data search mode and this is based on the indexes of the tables involved and mainly on the statistics collected from the data. (in the case of Mysql, command ANALYSE).
Depending on how Join, filters, indexes and data sampling, the optimizer can choose to search for the child record first and then the parent record... something that INNER JOIN would not allow (not for technical reasons, but because it is written in the ANSI rules, but I doubt whether Mysql respects 100% of these rules).

Copying and translating (google translator) the text of ANALYSE command, Mysql manual :

Mysql uses the stored key distribution to decide the order in which tables should be joined when you run a Join something other than a constant. Also, the distribution keys can be used when deciding which indexes to use for a specific table in a query.

Update Statistics

That is why it is common to see in any bank, when someone complains of slowness the 1st question is : you updated the statistics?
This goes for Mysql, Oracle, SQL SERVER, Informix, Postre, etc...

"Mutation" of data...

This example is what I see most common and easy to exemplify.
Let’s assume a table of purchase orders from a website, where the daughter table, purchased items have status fields (Delivered, Pending, Canceled).

The amount of orders with children in Pending status is relative to what you filter.

If you have a database with a good history (thousands of records), naturally the status volume in Pending will be low... but if you look at the last 2 months, it will probably be very high.
Depending on what you search for in your SQL, when relating the Father and Daughter table, filter the period and the status engine can consider more advantageous and fast start searching for the children’s records.

I will try to "draw" the situation: Imagine :

Father Order Table, with a 5-year history and 100,000 order records.
Daughter table of articles, with 800,000 records (average of 8 articles per request).
This table has the status column and an index about this status.
Does that mean that :
800,000 articles / 5 years = 160,000 articles per year
160,000 / 12 months = 13,300 articles per month.

Assuming the company maintains good quality delivery and you will hardly see orders over 2 months old with items in status Pending...
(This means a maximum of 27thousand records of articles)

You want to fetch all outstanding items...
But you wrote SQL like this:
select * from pedido, artigos where pedido.pedido_id = artigos.pedido_id and artigos.status = 'P'
Due to data sampling statistics, there is a good chance that the database will choose to start the filter from the table articles because it will fetch in the first step only 27thousand records and then much less records in the table Requests.

But if you write SQL in ANSI mode or if by some the database chooses to start searching for the table Requests , it will fetch the 100,000 records from the table Request + the 800,000 table records articles and then filter the status.
This is because the filter is on the second level of the search...

Of course a period filter would help, but there are cases where business logic will not allow what ends up forcing the developer to write SQL above.

Again... I have doubts whether this explanation applies 100% to Mysql.

Browser other questions tagged mysql sql performance

You are not signed in. Login or sign up in order to post.

by Rodrigo Rigotti • **12,139** points · Answer 1 · 2014-03-26T18:55:14+00:00

It makes no difference. The only difference is that in the first query you have a implicit JOIN and, in the second, a Explicit JOIN.

In the past there was some difference in performance yes (it was preferable to use the Explicit JOIN), but not today.

by AndersonBS • **3,194** points · Answer 2 · 2014-03-27T00:02:58+00:00

Apparently it makes no difference when the searches are small, because for the user the end result is the same. However, the INNER JOIN has been developed not only to visually improve code syntax, but also to get a small gain in performance that varies from server to server, but is usually about 1%.

When using the INNER JOIN, before the server runs SELECT, it organizes the statement into a plan, which is a set of steps optimized to collect data efficiently.

Not to mention that the INNER JOIN is part of and is recommended by the ANSI standard.