Improve SQL query performance that accounts for other tables

Asked

Viewed 195 times

1

I have an E-learning system and I need to create a report to return to me how many students have passed, failed, students who do not access more than 30 days, students who are taking the course and those who have not accessed yet.

The tables that were used are

`company`            Tabela de empresas;
`company_place`      Tabela de centro de treinamentos;
`user`               Tabela de usuários;
`user_course`        Tabela que armazena a inscrição de usuários nos cursos;
`user_course_lesson` Registra as aulas assistidas pelo usuário.

The rules are as follows:

For approved students: must contain date of completion and the field approved like 1;

For failed students: must contain date of completion and the field approved like 0;

For dropout students: without completion date, the field approved as 0, and the last update in the table user_course_lesson must be more than 30 days.

For students who have not accessed yet: There should be no classroom displayed in the table user_course_lesson

For students with the course in progress: total records with no completion date and with the field approved as 0, less dropouts and students who did not access.

The SQL I’m using is the following

SELECT
    /* Nome da empresa */
    c.`name`,

    /* Alunos aprovados */
    SUM( case when uc.`approved` = "1" AND uc.`finished_at` IS NOT NULL then 1 else 0 end ) as approved,

    /* Alunos reprovados */
    SUM( case when uc.`approved` = "0" AND uc.`finished_at` IS NOT NULL then 1 else 0 end ) as reproved,

    /* Alunos desistentes */
    SUM( case when uc.`approved` = "0" AND uc.`finished_at` IS NULL AND (
        SELECT 
            MAX( case when `updated_at` IS NULL then `created_at` else `updated_at` end ) as "last_access"
        FROM `user_course_lesson` as cl
        WHERE cl.`user_course_id` = uc.`id`
        ORDER BY `last_access` DESC
        LIMIT 1
    ) <= DATE_ADD(now(), INTERVAL(-30) DAY) then 1 else 0 end ) as dropout,

    /* Alunos que não acessaram ainda */
    SUM( case when uc.`approved` = "0" AND uc.`finished_at` IS NULL AND (
        SELECT 
            COUNT( cl.`id` ) as "access"
        FROM `user_course_lesson` as cl
        WHERE cl.`user_course_id` = uc.`id`
    ) = 0 then 1 else 0 end ) as not_access,

    /* Alunos com o curso em andamento */
    SUM( case when uc.`approved` = "0" AND uc.`finished_at` IS NULL then 1 else 0 end ) - 
    SUM( case when uc.`approved` = "0" AND uc.`finished_at` IS NULL AND (
        SELECT 
            MAX( case when `updated_at` IS NULL then `created_at` else `updated_at` end ) as "last_access"
        FROM `user_course_lesson` as cl
        WHERE cl.`user_course_id` = uc.`id`
        ORDER BY `last_access` DESC
        LIMIT 1
    ) <= DATE_ADD(now(), INTERVAL(-30) DAY) then 1 else 0 end ) -
    SUM( case when uc.`approved` = "0" AND uc.`finished_at` IS NULL AND (
        SELECT 
            COUNT( cl.`id` ) as "access"
        FROM `user_course_lesson` as cl
        WHERE cl.`user_course_id` = uc.`id`
    ) = 0 then 1 else 0 end ) as pending,
    count( uc.`id` ) as `total`
FROM `company` as c

INNER JOIN `company_place` as p
    ON p.`company_id` = c.`id`

INNER JOIN `user` as u
    ON u.`place_id` = p.`id`

INNER JOIN `user_course` as uc
    ON uc.`user_id` = u.`id`;

The problem

SQL repeats many queries, and the time to calculate with few records is already high, around 0.056 seconds, with the increase of records this time will increase exponentially.

Ideally the queries in the table user_course_lesson did not repeat themselves, or had another way to write this SQL with Joins or another form of relationship.

  • 1

    I didn’t go into the query itself, but by her face it might be better to consider pre-processing these statistics from time to time, and storing them in some sort of cache (for example, in another table).

  • I have considered this, but the maintenance cost it had in the old version of the same project was very high. Constantly had to check the accounts, not to mention that the project has several registered companies, and access of about 15000 students daily per company. I really believed that keeping it in another table would be the solution, but I suffered too much to do it. It was much given in one table, and little reliability. I was thinking about creating some view or something, but I don’t know if the performance gain would be very advantageous.

  • I need to improve the little that is the query, because you can store in the memcache later and not have to run it every time. Another detail is that the project is constantly audited, so the concern with data reliability.

1 answer

2


The solution had a reduction of four consultations, but had to include a LEFT JOIN, because I need the table to return me the table records user_course even if you have no relationship with user_course_lesson.

I changed the syntax CASE for IF only to improve reading.

Before the consultation took on average 0.056 seconds, now decreased to 0.006 seconds.

SELECT
    /* Nome da empresa */
    c.`name`,

    /* Alunos aprovados */
    SUM( IF( uc.`approved` = "1" AND uc.`finished_at` IS NOT NULL, 1, 0 ) ) as approved,

    /* Alunos reprovados */
    SUM( IF( uc.`approved` = "0" AND uc.`finished_at` IS NOT NULL, 1, 0 ) ) as reproved,

    /* Alunos desistentes */
    SUM( IF( uc.`approved` = "0" AND uc.`finished_at` IS NULL AND uc.`date` <= DATE_ADD(now(), INTERVAL(-30) DAY), 1, 0 ) ) as dropout,

    /* Alunos que não acessaram ainda */
    SUM( IF( uc.`approved` = "0" AND uc.`finished_at` IS NULL AND uc.`date` IS NULL, 1, 0 ) ) as not_access,

    /* Alunos com o curso em andamento */
    SUM( IF( uc.`approved` = "0" AND uc.`finished_at` IS NULL, 1, 0 ) ) - 
    SUM( IF( uc.`approved` = "0" AND uc.`finished_at` IS NULL AND uc.`date` <= DATE_ADD(now(), INTERVAL(-30) DAY), 1, 0 ) ) -
    SUM( IF( uc.`approved` = "0" AND uc.`finished_at` IS NULL AND uc.`date` IS NULL, 1, 0 ) ) as pending,

    /* Total */
    count( uc.`id` ) as `total`
FROM `company` as c

INNER JOIN `company_place` as p
    ON p.`company_id` = c.`id`

INNER JOIN `user` as u
    ON u.`place_id` = p.`id`

INNER JOIN (
    SELECT 
        uc.`id`,
        uc.`user_id`,
        uc.`approved`,
        uc.`finished_at`,
        MAX( IF( cl.`updated_at` IS NULL, cl.`created_at`, cl.`updated_at` ) ) as "date"
    FROM `user_course` as uc
    LEFT JOIN `user_course_lesson` as cl
    ON cl.`user_course_id` = uc.`id`
    GROUP BY uc.`id`
)  as uc
ON uc.`user_id` = u.`id`

GROUP BY c.`id`;

Browser other questions tagged

You are not signed in. Login or sign up in order to post.