Multiple LEFT JOIN and slow search

Asked

Viewed 2,126 times

10

In a client application (a school calendar and record of absences/presences) there is this query:

SELECT *, the_courses.course_desc as course_name, schedule.id as schedule_id, students.id as student_id, attendance.id_state as presence, attendance.id as presence_id
FROM ".T_BOOKINGS." AS attendance
LEFT JOIN ".T_BOOKINGS_CLIENT_BOOKINGS." AS schedule ON attendance.id_client_booking = schedule.id
LEFT JOIN ".T_STUDENT_SUBJECT_MATRIX." AS subject_matrix ON schedule.classes_id=subject_matrix.class
LEFT JOIN ".T_SCHOOL_COURSE." AS the_courses ON schedule.course_id = the_courses.id     
LEFT JOIN ".T_BOOKINGS_CLIENTS." AS students ON subject_matrix.student = students.id
LEFT JOIN ".T_CLASS_DETAILS." AS classes ON schedule.classes_id= classes.id 
    AND attendance.student_id= students.id 
WHERE classes.state<>0 ".$sql_condition." ORDER BY schedule.the_date DESC";

The query takes more than 1 minute to answer and I wonder if there is a more effective way to do this query in Mysql.

T_BOOKINGS_CLIENT_BOOKINGS records lesson schedules, ~3,500 records (Innodb)
T_BOOKINGS record of absences. One entry each class/student/day, ~24,000 records (Myisam)
T_STUDENT_SUBJECT_MATRIX records that each student is enrolled. ~ 450 records (Innodb)
$sql_condition is a filter to show specific student, class or teacher, depending on the front-end selects.

The other tables are record of students and classrooms for Mysql to fetch names to complete in the HTML that ajax returns. All tables have PRIMARY/BTREE/single index in the table id column which are int(11).

EXPLAIN:

inserir a descrição da imagem aqui

  • 2

    You can run a EXPLAIN and add the screenshot? Helps assess the problem.

  • @Zuul joined the explain.

  • This AND attendance.student_id= students.id should not be after the WHERE ? Where you are enters the last `LEFT JOIN but does not work on it which generates an extra check to all records.

  • 1

    Another thing to bear in mind, in the first line of the EXPLAIN: The Using temporary indicates that it was necessary to create a temporary table to resolve the query. This is slow and could be the big cause behind your waiting time. Note that in this same line no index has been chosen to resolve the query.

  • For a base with a few thousand records like yours this delay is strange. If you are returning many records an index by the date column, by which the ordering is being done may help. Also check if you can exchange some outter Join for a Inner Join.

  • 1

    With * I can already tell you that it can be this. I say this because I went through the same problem, but the table had several columns. Maybe in your case nothing will change explicitly select the columns.

  • You really need LEFT JOIN?

Show 2 more comments

2 answers

6


Explanation of consultation

Whenever an appointment is delayed, one runs EXPLAIN because the same gives you crucial information to understand what may be wrong.

In your case, the EXPLAIN stresses in the column Extra that in the consultation as regards the scale subject_matrix you are not finding any indexing to construct the result, so you are creating a temporary table to store the results in order to resolve the requested query.

The causes have several origins, but usually there is a key field used to link to other tables or on which the search is performed whose same is not indexed, forcing Mysql to run all the records for each time it goes to the table in question.

In your particular case, the camp subject_matrix.class seems to be responsible, so you should check if it is indexed.

Good practice:
Always see the result of EXPLAIN. Some problems only "see" them when applications grow, but a EXPLAIN consultation helps us predict potential headaches!


Ordering of results

In complex queries, ordering has an impact on the performance of the database query.

In your case, given the amount of JOIN, the field used for sorting must be indexed by how you want to receive the results.

The field the_date table schedule must be indexed in a DESC.

Good practice:
Always have the sort column indexed, nothing is lost.


Conditions of consultation

The columns about which we will check data for results in a query must be indexed.

Whenever Mysql checks for a record with the value X in the column Y, it will have to go row by row check and compare the column value Y with the value X received.

If indexation is available in the column Y, Mysql will perform the query on the index of the same, thus avoiding to go through the whole table to check if it has results.

$sql_condition

This variable contains conditions, conditions that work on certain columns of certain tables. The columns in question must be indexed to make the research work on them more effective.

This scenario may worsen even more when there are JOIN and the condition requires checking each record of the main table with each record of the secondary table.

JOIN tabela ON ( campo = campo )

The columns on which we establish a relation between tables should equally allow an indexed comparison, giving preference whenever possible to relations by numerical keys.

Good practice:
If the column will be searched, index.


Syntax and Indentation

In general terms, syntax and indentation are our friends because they help us to read more effectively and also help us to identify small details or typos that can be the cause of the problem at hand.

One of the things I recommend to beginners in Mysql is the use of parentheses. They end up being discarded by Mysql in most cases, but in others they help us realize potential problems.

In your case, an off-site line causes some redundancy in determining the results:

-- ...
AND attendance.student_id = students.id
-- ...

She’s getting caught by JOIN wrong. When adding syntax and indentation gets:

-- ...
LEFT JOIN ".T_CLASS_DETAILS." AS classes ON (
    schedule.classes_id = classes.id
    AND
    attendance.student_id = students.id
)
-- ...

But in reality, since both fields do not concern the alias classes, the intended is presumed to be:

-- ...
LEFT JOIN ".T_BOOKINGS_CLIENTS." AS students ON (
    subject_matrix.student = students.id
    AND
    attendance.student_id = students.id
)
LEFT JOIN ".T_CLASS_DETAILS." AS classes ON (
    schedule.classes_id= classes.id
)
-- ...

In order to prevent potential problems caused by the above mentioned, I suggest your indented query as follows:

$query = "
SELECT
    the_courses.course_desc AS course_name,
    schedule.id AS schedule_id,
    students.id AS student_id,
    attendance.id_state AS presence,
    attendance.id AS presence_id
FROM ".T_BOOKINGS." AS attendance
LEFT JOIN ".T_BOOKINGS_CLIENT_BOOKINGS." AS schedule ON (
    attendance.id_client_booking = schedule.id
)
LEFT JOIN ".T_STUDENT_SUBJECT_MATRIX." AS subject_matrix ON (
    schedule.classes_id = subject_matrix.class
)
LEFT JOIN ".T_SCHOOL_COURSE." AS the_courses ON (
    schedule.course_id = the_courses.id
)
LEFT JOIN ".T_BOOKINGS_CLIENTS." AS students ON (
    subject_matrix.student = students.id
    AND
    attendance.student_id = students.id
)
LEFT JOIN ".T_CLASS_DETAILS." AS classes ON (
    schedule.classes_id = classes.id
)
WHERE classes.state<>0
".$sql_condition."
ORDER BY schedule.the_date DESC ";

Good practice:
Even though I am working on a file that relates to another programming language, in your case it seems to be PHP, the Mysql code must be indented and with the syntax properly highlighted for easy reading and future maintenance.


Fields to be collected

Notice that above when indenting the query I removed the * of the list of fields to be collected.

When we perform a query on a single table, telling Mysql we want everything has no impact on performance:

SELECT * FROM superBuBu

But when we are working with queries where we join several tables, we have to take into account that the * will tell Mysql: I want all fields of all tables.

Good practice:
In practical terms, if for some reason Mysql has to process the data in a more extensive way, it will have much less information to worry about which will make it faster.


Regarding your particular case:

After the explanations given above, it is possible to conclude that they justify your performance problems in your query.

If we summarize the extra efforts that Mysql is having with the query:

  • We saw that Mysql is creating a temporary table to save the results of the query while working them;
  • We saw that Mysql is going through the results of the query line by line to discern whether each of the lines should be included in the results;
  • We saw that after all this the results still have to be ordered by a field of descending form;
  • And we saw that we are telling Mysql to select all columns of all tables involved, which results in a temporary MEGA table.

we conclude that if not the whole problem, at least the largest portion of it is solved if those points have been dealt with as suggested.


Note:
The answer may have gotten a little dense, but the idea was to transmit in a non-technical way some of the care to take to carry out queries in Mysql focusing on the most common problems taking as a practical example the problem of @Sergio.

The response of @utluiz I was already talking about these problems, but I thought it was relevant to leave a more detailed explanation for the less initiated.

2

Adding some comments (@Zull, @jean and @Earendul) with some more information, I’ve put together a set of actions that should help improve performance:

  1. Remove the * and select only the fields really needed.
  2. Create an index in the field schedule.the_date for ordering to be more efficient.
  3. The clause AND attendance.student_id= students.id there should be a line up, after all she does the Join table students.
  4. Create an index in the field subject_matrix.class, because Mysql is making a table scan in all records to find what you need.
  5. Create one or more indexes involving the fields classes.state, student, class and teacher. They are used in their $sql_condition and this is causing another table scan. Note on explain thousands of records are being read one by one to make the filter. The index would avoid that amount of reading.

Note: the answer was a comment that got very large.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.