This answer may not be correct for this case, but I think it valid to share knowledge.
What is my knowledge and experience, using the INNER JOIN
in fact you are tying the search and filter method that the bank will use between the tables.
In a way it’s like imposing on the BD: first you get the "father" record and then the "son record".
If you are sure of the relationship , knowledge of the content of the data and that this content is not dynamic enough to affect the way you set up your relationship, the INNER JOIN
will always perform well.
However.... not everything is perfect and it is common to see monstrous or "analytical" Sqls under data, relatively complex relationships and table data that change with time (I explain below)
When using the "non-aansi" pattern (without the JOIN clause) you leave open so that the optimizer of the engine choose the data search mode and this is based on the indexes of the tables involved and mainly on the statistics collected from the data. (in the case of Mysql, command ANALYSE
).
Depending on how Join, filters, indexes and data sampling, the optimizer can choose to search for the child record first and then the parent record... something that INNER JOIN would not allow (not for technical reasons, but because it is written in the ANSI rules, but I doubt whether Mysql respects 100% of these rules).
Copying and translating (google translator) the text of ANALYSE command, Mysql manual :
Mysql uses the stored key distribution to decide the
order in which tables should be joined when you run a Join
something other than a constant. Also, the distribution
keys can be used when deciding which indexes to use
for a specific table in a query.
Update Statistics
That is why it is common to see in any bank, when someone complains of slowness the 1st question is : you updated the statistics?
This goes for Mysql, Oracle, SQL SERVER, Informix, Postre, etc...
"Mutation" of data...
This example is what I see most common and easy to exemplify.
Let’s assume a table of purchase orders from a website, where the daughter table, purchased items have status fields (Delivered, Pending, Canceled).
The amount of orders with children in Pending status is relative to what you filter.
If you have a database with a good history (thousands of records), naturally the status volume in Pending will be low... but if you look at the last 2 months, it will probably be very high.
Depending on what you search for in your SQL, when relating the Father and Daughter table, filter the period and the status engine can consider more advantageous and fast start searching for the children’s records.
I will try to "draw" the situation:
Imagine :
- Father Order Table, with a 5-year history and 100,000 order records.
Daughter table of articles, with 800,000 records (average of 8 articles per request).
This table has the status column and an index about this status.
Does that mean that :
800,000 articles / 5 years = 160,000 articles per year
160,000 / 12 months = 13,300 articles per month.
Assuming the company maintains good quality delivery and you will hardly see orders over 2 months old with items in status Pending...
(This means a maximum of 27thousand records of articles)
You want to fetch all outstanding items...
But you wrote SQL like this:
select * from pedido, artigos where pedido.pedido_id = artigos.pedido_id and artigos.status = 'P'
Due to data sampling statistics, there is a good chance that the database will choose to start the filter from the table articles because it will fetch in the first step only 27thousand records and then much less records in the table Requests.
But if you write SQL in ANSI mode or if by some the database chooses to start searching for the table Requests , it will fetch the 100,000 records from the table Request + the 800,000 table records articles and then filter the status.
This is because the filter is on the second level of the search...
Of course a period filter would help, but there are cases where business logic will not allow what ends up forcing the developer to write SQL above.
Again... I have doubts whether this explanation applies 100% to Mysql.
Performace in which database? the first query is used in old databases that do not support syntax/standardization
ANSI92(join)
.– rray
In Mysql Database
– Tuyoshi Vinicius
Okay, edit the question and make it clear that it’s on
mysql
, pq tags are with other banks.– rray
I didn’t think it would make a difference in other banks.
– Tuyoshi Vinicius
The syntax may be the same but the internal search engines of the databases are different or different response times.
– rray