Mysql query - Cross Tables count

Asked

Viewed 287 times

5

I have the following tables:

Diagrama

Each tuple of the table Pageviews (main) contains a single page view. If by chance the same person in the same session (Sessions) view the same page (Pages), a new tuple is not added in Pageviews. Instead the field is incremented Pageviews..

In one of the reports, I need the amount of unique visitors (COUNT DISTINCT Visitors id.) and pageviews (SUM Pageviews.) at a particular time (day, week, month and year) of a specific store (of which the Stores.code).

The appointments I’m making at the moment are:

Visitors

SELECT DISTINCT Visitors.id
    FROM PageViews
      INNER JOIN VisitorsStores ON VisitorsStores.id = PageViews.visitor
      INNER JOIN Stores ON Stores.id = VisitorsStores.store
      INNER JOIN Sessions ON Sessions.id = VisitorsStores.session
      INNER JOIN Visitors ON Visitors.id = Sessions.visitor
    WHERE
      DATE(PageViews.createdAt) BETWEEN '2017-11-01' AND '2017-11-30' AND 
      Stores.code = 'loja1';

Pageviews

SELECT SUM(quantity) AS qtde
    FROM PageViews
      INNER JOIN VisitorsStores ON VisitorsStores.id = PageViews.visitor
      INNER JOIN Stores ON Stores.id = VisitorsStores.store
    WHERE
      DATE(PageViews.createdAt) BETWEEN '2017-11-01' AND '2017-11-30' AND 
      Stores.code = 'loja1';

Both

SET @loja := 'loja1';
SET @inicio := '2017-11-01';
SET @fim := '2017-11-31';

SELECT
  COUNT(tVisitors.qtdVisitors) AS visitors,
  SUM(tVisitors.qtdPageViews) as pageViews
FROM (
  SELECT
    Visitors.id   AS qtdVisitors,
    SUM(quantity) AS qtdPageViews
  FROM PageViews
    INNER JOIN VisitorsStores ON VisitorsStores.id = PageViews.visitor
    INNER JOIN Stores ON Stores.id = VisitorsStores.store
    INNER JOIN Sessions ON Sessions.id = VisitorsStores.session
    INNER JOIN Visitors ON Visitors.id = Sessions.visitor
  WHERE
    DATE(PageViews.createdAt) BETWEEN @inicio AND @fim AND Stores.code = @loja
  GROUP BY qtdVisitors
) as tVisitors;

It works great when a store has few daily access. But when a store has many accesses (some around 300 visitors and 5,000 pageviews) the consultation takes a long time to complete (32s was the longest in the mentioned situation. Stores with less than 1,000 pageviews usually take less than 3s).

The question is: How else could I make this query to be faster?


EXPLAIN query Both for a single day (05/12)

(Visitor: 696 | pageviews: 5809 | query runtime: 32s)

[
    {
        "id": 1,
        "select_type": "PRIMARY",
        "table": "<derived2>",
        "type": "ALL",
        "possible_keys": null,
        "key": null,
        "key_len": null,
        "ref": null,
        "rows": 6192,
        "Extra": ""
    },
    {
        "id": 1,
        "select_type": "PRIMARY",
        "table": "<derived3>",
        "type": "ALL",
        "possible_keys": null,
        "key": null,
        "key_len": null,
        "ref": null,
        "rows": 6192,
        "Extra": ""
    },
    {
        "id": 3,
        "select_type": "DERIVED",
        "table": "Stores",
        "type": "index",
        "possible_keys": "PRIMARY",
        "key": "code",
        "key_len": "62",
        "ref": null,
        "rows": 1,
        "Extra": "Using where; Using index"
    },
    {
        "id": 3,
        "select_type": "DERIVED",
        "table": "VisitorsStores",
        "type": "ref",
        "possible_keys": "PRIMARY,FK_VisitorsStores_Stores",
        "key": "FK_VisitorsStores_Stores",
        "key_len": "5",
        "ref": "tmw_views.Stores.id",
        "rows": 6192,
        "Extra": "Using index"
    },
    {
        "id": 3,
        "select_type": "DERIVED",
        "table": "PageViews",
        "type": "ref",
        "possible_keys": "FK_PageViews_VisitorsStores",
        "key": "FK_PageViews_VisitorsStores",
        "key_len": "8",
        "ref": "tmw_views.VisitorsStores.id",
        "rows": 1,
        "Extra": "Using where"
    },
    {
        "id": 2,
        "select_type": "DERIVED",
        "table": "Stores",
        "type": "index",
        "possible_keys": "PRIMARY",
        "key": "code",
        "key_len": "62",
        "ref": null,
        "rows": 1,
        "Extra": "Using where; Using index; Using temporary"
    },
    {
        "id": 2,
        "select_type": "DERIVED",
        "table": "VisitorsStores",
        "type": "ref",
        "possible_keys": "PRIMARY,FK_VisitorsStores_Sessions,FK_VisitorsStores_Stores",
        "key": "FK_VisitorsStores_Stores",
        "key_len": "5",
        "ref": "tmw_views.Stores.id",
        "rows": 6192,
        "Extra": "Using where"
    },
    {
        "id": 2,
        "select_type": "DERIVED",
        "table": "Sessions",
        "type": "eq_ref",
        "possible_keys": "PRIMARY,FK_Sessions_Visitors",
        "key": "PRIMARY",
        "key_len": "8",
        "ref": "tmw_views.VisitorsStores.session",
        "rows": 1,
        "Extra": ""
    },
    {
        "id": 2,
        "select_type": "DERIVED",
        "table": "Visitors",
        "type": "eq_ref",
        "possible_keys": "PRIMARY",
        "key": "PRIMARY",
        "key_len": "8",
        "ref": "tmw_views.Sessions.visitor",
        "rows": 1,
        "Extra": "Using index"
    },
    {
        "id": 2,
        "select_type": "DERIVED",
        "table": "PageViews",
        "type": "ref",
        "possible_keys": "FK_PageViews_VisitorsStores",
        "key": "FK_PageViews_VisitorsStores",
        "key_len": "8",
        "ref": "tmw_views.VisitorsStores.id",
        "rows": 1,
        "Extra": "Using where; Distinct"
    }
]

SHOW TABLE STATUS FROM tmw_views;

{
    "table": "TABLES",
    "rows":
    [
        {
            "Name": "PageViews",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 1560181,
            "Avg_row_length": 84,
            "Data_length": 131694592,
            "Max_data_length": 0,
            "Index_length": 194314240,
            "Data_free": 12582912,
            "Auto_increment": null,
            "Create_time": "2017-10-27 23:31:14",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "PageViewsYear",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 25,
            "Avg_row_length": 655,
            "Data_length": 16384,
            "Max_data_length": 0,
            "Index_length": 32768,
            "Data_free": 0,
            "Auto_increment": 62,
            "Create_time": "2017-11-27 11:17:44",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "Pages",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 94648,
            "Avg_row_length": 116,
            "Data_length": 11026432,
            "Max_data_length": 0,
            "Index_length": 0,
            "Data_free": 4194304,
            "Auto_increment": 205925,
            "Create_time": "2017-10-27 23:31:14",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "Sessions",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 366180,
            "Avg_row_length": 73,
            "Data_length": 26804224,
            "Max_data_length": 0,
            "Index_length": 41058304,
            "Data_free": 3145728,
            "Auto_increment": 531262,
            "Create_time": "2017-10-27 23:31:14",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "Stores",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 0,
            "Avg_row_length": 0,
            "Data_length": 16384,
            "Max_data_length": 0,
            "Index_length": 16384,
            "Data_free": 0,
            "Auto_increment": 31,
            "Create_time": "2017-10-27 23:31:14",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "SystemLogs",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 5913,
            "Avg_row_length": 1687,
            "Data_length": 9977856,
            "Max_data_length": 0,
            "Index_length": 540672,
            "Data_free": 4194304,
            "Auto_increment": null,
            "Create_time": "2017-11-20 15:21:31",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "Visitors",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 31067,
            "Avg_row_length": 51,
            "Data_length": 1589248,
            "Max_data_length": 0,
            "Index_length": 1589248,
            "Data_free": 4194304,
            "Auto_increment": 102579,
            "Create_time": "2017-10-27 23:31:14",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        },
        {
            "Name": "VisitorsStores",
            "Engine": "InnoDB",
            "Version": 10,
            "Row_format": "Compact",
            "Rows": 297216,
            "Avg_row_length": 44,
            "Data_length": 13107200,
            "Max_data_length": 0,
            "Index_length": 17842176,
            "Data_free": 4194304,
            "Auto_increment": 562767,
            "Create_time": "2017-10-27 23:31:14",
            "Update_time": null,
            "Check_time": null,
            "Collation": "utf8_general_ci",
            "Checksum": null,
            "Create_options": "",
            "Comment": ""
        }
    ]
}

SHOW VARIABLES LIKE 'query_cache%';

query_cache_limit             1048576
query_cache_min_res_unit      4096
query_cache_size              33554432
query_cache_strip_comments    OFF
query_cache_type              ON
query_cache_wlock_invalidate  OFF
  • can post the result of EXPLAIN statement of the query Both? And if possible put also the SHOW TABLE STATUS FROM `<db_name>`; Thus, we have how to know which are the engines of the tables.

  • @pss1support Added requested information

  • 1

    Thanks for putting it in JSON format! It makes it much easier!

  • Hello Szag, did you come to create some index to try to optimize these queries? How are you using Innodb the Fks implicitly create indexes. That said, indexes like CREATE INDEX pv_createdAt_idx ON PageViews(createdAt) USING BTREE; should help with the consultation (in return they make inserts a little slower). I assume that Stores.code is also unique key or is using a single correct index?

  • Yes, the fields used in the query are all indexed

  • Is there a lot of time difference between the two individual queries and the unified query (using Subqueries)? For this store that takes 32 seconds, how long each of the individual queries takes?

  • Yes. Separately, the first (visitors) took 18s and the second (pageviews) 17s. That is, unified they already have a better performance.

  • Um... if the individual queries are taking around half the time I remain suspicious of indexes. Especially as you are using a function DATE(PageViews.createdAt) depending on how you created the index Mysql will not be able to use, you could post the command you used to create the contents needed for the query?

  • 1

    OK, I will fry here another solution! I will delete the above comments.

  • 2

    I got a solution with slightly better performance, I will post above (19s in total, against the 32s of the previous method).

  • Szag-Ot, no, no .... but gave in the same!! His words above: "the first (visitors) took 18s and the second (pageviews) 17s.". Something else! You can answer your own question!

  • Here, in practice only reduced a query, but not optimized the query yes. That already has the cost of 17 or 18s! Understood?

  • 1

    You can inform the output: SHOW VARIABLES LIKE ?query_cache%';?

  • @pss1support Adding what was requested in the post

Show 9 more comments

2 answers

5


The problem is in the use of the function DATE(), since it ends up hindering the use of the index.

Think of the following: for each tuple, Mysql has to perform the function before checking if the date is within the limit.

I recommend that you instead put the start date at the start time (2017-12-31 00:00:00) and at the end date the final time (2017-12-31 23:59:59).

So you wouldn’t have to use the DATE() and the index will improve the speed of the query, as it should be.

SET @loja := 'loja1';
SET @inicio := '2017-11-01 00:00:00';
SET @fim := '2017-11-31 23:59:59';

SELECT
  COUNT(tVisitors.qtdVisitors) AS visitors,
  SUM(tVisitors.qtdPageViews) as pageViews
FROM (
  SELECT
    Visitors.id   AS qtdVisitors,
    SUM(quantity) AS qtdPageViews
  FROM PageViews
    INNER JOIN VisitorsStores ON VisitorsStores.id = PageViews.visitor
    INNER JOIN Stores ON Stores.id = VisitorsStores.store
    INNER JOIN Sessions ON Sessions.id = VisitorsStores.session
    INNER JOIN Visitors ON Visitors.id = Sessions.visitor
  WHERE
    PageViews.createdAt BETWEEN @inicio AND @fim AND Stores.code = @loja
  GROUP BY qtdVisitors
) as tVisitors;
  • 1

    Perfect! Thank you very much! After the changes, the queries are executed in maximum 2s! Excellent!!!

  • I said the problem was in the query not being able to use the s:D index. Just be careful with columns of the type TIMESTAMP because Mysql does calculations using the time zone when storing and retrieving values (basically it converts to UTC and back using the settings of Timezone server or session). I’ve seen a lot of people having problems with other daylight savings and things like that.

  • In fact you insisted that the problem was on the index, even after I informed you that the index was correct. Unlike this, the colleague here showed where the problem really was, why the problem was and how to fix it. Even so, thank you also, @Anthonyaccioly, for wanting to help. Case closed, vlw!

2

Being quite direct: at the moment an implicit JOIN is made between tVisitor with tPageViews in the query Both.

We can improve and optimize with Semi-join [ WHERE Stores.id IN (Subqueries)], because both use the same data in the clause WHERE and Subqueries tPageViews is just one COUNT().

NOTE 1: not tested query in my environment, is just a theoretical abstraction of a possible solution.

-- Teste de pageviews
SELECT SUM(quantity) AS qtdePageViews, COUNT(Visitors.id) AS visitor
FROM PageViews
  INNER JOIN VisitorsStores ON VisitorsStores.id = PageViews.visitor
  INNER JOIN Stores ON Stores.id = VisitorsStores.store
  INNER JOIN Sessions ON Sessions.id = VisitorsStores.session
  INNER JOIN Visitors ON Visitors.id = Sessions.visitor
WHERE Stores.id IN (
    SELECT Stores.id
    FROM PageViews
      INNER JOIN VisitorsStores ON VisitorsStores.id = PageViews.visitor
      INNER JOIN Stores ON Stores.id = VisitorsStores.store
    WHERE
      DATE(PageViews.createdAt) BETWEEN '2017-11-01' AND '2017-11-30' 
      AND Stores.code = 'loja1'
);

Just to contribute and facilitate verification, follow the tables with SQL in the dialect of Mysql:

Table Visitors:

CREATE TABLE `Visitors` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `ip` varchar(50) NOT NULL,
  `createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table Stores:

CREATE TABLE `Stores` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `code` varchar(250) NOT NULL,
  `visitor` bigint(20) unsigned DEFAULT NULL,
  `createdAt` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `code_UNIQUE` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table Pages:

CREATE TABLE `Pages` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `url` varchar(500) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table VisitorsStores:

CREATE TABLE `VisitorsStores` (
  `id` bigint(20) unsigned NOT NULL,
  `session` bigint(20) unsigned DEFAULT NULL,
  `store` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `session_FKIndex_idx` (`session`),
  KEY `store_FKIndex_idx` (`store`),
  CONSTRAINT `session_FKIndex` FOREIGN KEY (`session`) 
    REFERENCES `Sessions` (`id`) ON DELETE 
        NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `store_FKIndex` FOREIGN KEY (`store`) 
    REFERENCES `Stores` (`id`) 
        ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table PageViews:

CREATE TABLE `PageViews` (
  `page` int(10) unsigned NOT NULL,
  `visitor` bigint(20) unsigned NOT NULL,
  `quantity` int(10) unsigned DEFAULT NULL,
  `updatedAt` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `createdAt` datetime DEFAULT NULL,
  PRIMARY KEY (`page`,`visitor`),
  KEY `visitor_FKIndex_idx` (`visitor`),
  CONSTRAINT `page_FKIndex` FOREIGN KEY (`page`) 
    REFERENCES `Pages` (`id`) 
        ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `visitor_FKIndex` FOREIGN KEY (`visitor`) 
    REFERENCES `VistitorsStores` (`id`) 
        ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Table Sessions:

CREATE TABLE `Sessions` (
  `id` bigint(20) unsigned NOT NULL,
  `code` varchar(250) DEFAULT NULL,
  `visitor` bigint(20) unsigned DEFAULT NULL,
  `createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `session_FKIndex_visitor_idx` (`visitor`),
  CONSTRAINT `session_FKIndex_visitor` FOREIGN KEY (`visitor`) 
    REFERENCES `Visitors` (`id`) 
        ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

There are several optimization strategies that we can use to make queries faster. And the one that I consider a priority is:

Queries work more efficiently when you choose one Storage Engine with properties that best meet the application requirements.

Optimizing Innodb queries [Mysql 5.6 - 2015, 1085 p.]

  • If you frequently have recurring queries for tables that are not updated frequently, enable the query cache:

    [mysqld]
    query_cache_type = 1
    query_cache_size = 10M
    
  • Do not create a separate secondary index for each column, as each query can only use one index. Column indexes or columns rarely tested with only a few different values may not be useful for any doubt. If you have many queries for the same table, testing different combinations of columns, try to create a small number of concatenated indices instead of a large number of indices in a single column. If an index contains all the columns required for the result set (known as the hedge index), the query can avoid reading the table data.

  • If an indexed column cannot contain NULL values, declare it as NOT NULL when you create the table. The optimizer can better determine which index is most effective for a query, when it knows whether each column contains NULL values or not.

    NOTE 2: In the context here of the question, would be the tuple: `Stores`. `code` (varchar(250)) which is too large should be indexed as below, using coverage index combining `Stores`. `id`with `Stores`. `code`:

    ALTER TABLE `Stores` 
    DROP INDEX `code_UNIQUE` ,
    ADD UNIQUE INDEX `code_UNIQUE` USING BTREE (`id` ASC, `code`(62) ASC);
    

Optimizing the Server :: Memory Parameters [Paul Dubois et al - 2006, 543 p.]

Interesting to keep in mind that when we think about performance and optimizations we cannot forget about the server. I know it might fall outside the scope of the question, but it would be an incomplete response if we didn’t optimize the server. In the context here, let’s focus on creating an archive my-large.cnf on the pasture /usr/share/mysql with the following values:

[mysqld]
key_buffer_size = 256M
table_cache = 256
sort_buffer_size = 1M
query_cache_type = ON
query_cache_size = 16M

Optimizing the Server :: Query Cache [Paul Dubois et al - 2006, 549 p.]

[mysqld]
query_cache_type = 1
query_cache_size = 10M
query_cache_limit = 2M

NOTE 3: Whenever changing configuration files it is necessary to restart the server Mysql

To check the values Query Cache current in your environment:

SHOW VARIABLES LIKE ‘query_cache%’;

Reference:
[Mysql 5.6 - 2015], MysqlTM, Mysql 5.6 Reference Manual: Including Mysql Cluster NDB 7.3-7.4 Reference Guide. Document generated on: 2015. May 08. (Revision: 43103)
[Paul Dubois et al - 2006], Copyright 2006 by Mysql AB, Mysql 5.0 Certification Study Guide
[Sheeri Cabral, Keith Murphy - 2009], Published by Wiley Publishing, Inc., Mysql® Administrator’s Bible

  • Thanks. But as commented, the above query is wrong, since it is multiplying the results to be counted (Visitors). Also, Stores.id is not the problem itself, so the IN query is no improvement. However the POST is valid as the suggestions for improvement in the server. I will analyze to see if all are being applied. Grateful!

  • Yes, I put that this is a possible solution using the semi-join of in the WHERE clause with the IN(subquery operator).

  • This method does not work because of the tuple systematics. It sums and counts wrongly. Still, thank you so much for your help! It must have taken a lot of work to develop the answer. Vlw itself. I will continue in the search for a better method

Browser other questions tagged

You are not signed in. Login or sign up in order to post.