Select the last months, also showing the months without records

Asked

Viewed 63 times

0

Hello.

I am doing a SELECT to return the last 06 months, showing the count of records (in the example: sales), grouping by year-month.

My query currently looks like this:

**Schema (MySQL v5.7)**

CREATE TABLE vendas (
   id INT NOT NULL,
   data_venda DATE NOT NULL,
   PRIMARY KEY (id),
   UNIQUE (id)
);

INSERT INTO vendas
   (id, data_venda)
VALUES
    (1, '2020-10-01'),
    (2, '2020-10-01'),
    (3, '2020-12-01'),
    (4, '2020-12-01'),
    (5, '2020-12-01'),
    (6, '2020-12-01'),
    (7, '2021-02-01'),
    (8, '2021-02-01'),
    (9, '2021-02-01'),
    (10, '2021-02-01'),
    (11, '2021-02-01'),
    (12, '2021-03-01'),
    (13, '2021-03-01'),
    (14, '2021-03-01'),
    (15, '2021-03-01'),
    (16, '2021-04-01');

Query #1

SELECT
    DATE_FORMAT(data_venda, "%Y-%m") AS ano_mes,
    COUNT(id)
FROM
    vendas
WHERE
    /* Utilize o NOW para mudar automaticamente,
    * deixei fixo 2021-04-30 para o exemplo
    * ainda funcionar no futuro.
    */
    /* data_venda >= DATE_SUB(NOW(), INTERVAL 6 MONTH) */
    data_venda >= DATE_SUB('2020-04-30', INTERVAL 6 MONTH)
GROUP BY
    DATE_FORMAT(data_venda, "%Y-%m");
anomes COUNT(id)
2020-12 4
2021-02 5
2021-03 4
2021-04 1

View on DB Fiddle

Note that, to be the last 06 months, should return also the month 2020-11 and 2021-01, showing as 0 (zero). Thus:

anomes COUNT(id)
2020-11 0
2020-12 4
2021-01 0
2021-02 5
2021-03 4
2021-04 1

I have already thought of doing the treatment after the query, directly in the system programming (create a function to correct the return of SELECT, direct in PHP, for example). But it seems to be more readable and simple if the consultation is already complete (with zero months).

I believe that the correction goes through some SUBSELECT that builds the last 06 months and then combines with my current SELECT (example above), but I’m not able to do it. I saw some questions here, but I couldn’t apply the answer either. Some talked about creating a trial or adjusting the month within a fixed year, but note that in this case there are two years involved.

Thank you to anyone who can assist.

  • This answers your question? Return results for all months of the interval

  • Thanks, Sorack. But I still have the problem of the year changing, right? Try applying with the creation of the 12-month list created in the given answer, but the query does not know which year the created month belongs to. Maybe the output is to create a base table of dates (leave it there in my database), with all possible months and years (until a very long date in the future), and filter from it. But I would like the support of those who have more knowledge, whether this is a good option or would be an easy/gambiarra exit.

  • Which version of Mysql you are using?

  • v5.7, same as in the example in the question.

1 answer

1


As suggested in the comment by @Sorack, you need to generate a sequence of dates and do LEFT JOIN (or RIGHT JOIN depending on which side are the generated dates) with sales chart.

I use an example based on this link: https://www.shayanderson.com/mysql/generating-a-series-of-dates-in-mysql.htm

Here for reference:

SELECT DATE(cal.date)
FROM (
      SELECT SUBDATE(NOW(), INTERVAL 30 DAY) + INTERVAL xc DAY AS date
      FROM (
            SELECT @xi:=@xi+1 as xc from
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc1,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc2,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc3,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc4,
            (SELECT @xi:=-1) xc0
      ) xxc1
) cal
WHERE cal.date <= NOW()
ORDER BY cal.date DESC

Basically this code does the following: the SELECT with UNION generate sequential numbers because of the variable @xi which is added up. Note that, each block of SELECT generates 4 numbers, and they are together (UNION) as if it were "FROM", but as there is no relationship between them, is made a Cartesian, ie the each SELECT generates 4 numbers, then 4x4x4x4 = 256, and the variable "counts" this, generating values from 0 to 255. Therefore, dates can only be generated in a 256 interval, in this case days (see range + INTERVAL xc DAY). To generate more values, simply increase the UNION or add more "tables" (xc5, xc6, etc).

From there, as only needs month year, we can group this and use DATE_SUB, as in your example:

SELECT 
      DISTINCT DATE_FORMAT(SUBDATE(NOW(), INTERVAL 5 MONTH) + INTERVAL xc DAY, "%Y-%m") AS data
    FROM 
      (
        SELECT @xi:=@xi+1 as xc from
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc1,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc2,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc3,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc4,
            (SELECT @xi:=-1) xc0
      ) xxc1

This will generate the month-to-year pairs (I used DISTICT to remove duplicates, after all it was generated from a date, or would have to group), something like this:

| data    |
| ------- |
| 2020-11 |
| 2020-12 |
| 2021-01 |
| 2021-02 |
| 2021-03 |
| 2021-04 |

Now that we have the dates, just make a LEFT JOIN (in this case, these dates will be left in JOIN, to bring even the data not existing in "sales")

SELECT cal.data,
       IFNULL(vda.tot, 0) tot
FROM 
  (
    SELECT 
      DISTINCT DATE_FORMAT(SUBDATE(NOW(), INTERVAL 5 MONTH) + INTERVAL xc DAY, "%Y-%m") AS data
    FROM 
      (
        SELECT @xi:=@xi+1 as xc from
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc1,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc2,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc3,
            (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc4,
            (SELECT @xi:=-1) xc0
      ) xxc1
  ) cal 

LEFT JOIN (
    SELECT 
      DATE_FORMAT(data_venda, "%Y-%m") AS ano_mes, 
      COUNT(id) tot 
    FROM 
      vendas 
    WHERE 
      data_venda >= DATE_SUB('2020-04-30', INTERVAL 6 MONTH) 
    GROUP BY 
      DATE_FORMAT(data_venda, "%Y-%m")
  ) vda on vda.ano_mes = data  
WHERE cal.data <= DATE_FORMAT(NOW(), "%Y-%m")
ORDER BY cal.data

Some remarks:

  • the "sales" table was added as subquery, with alias "vda";
  • to return zero when not finding a period, the function was used IFNULL, either returns the first value and, if null, returns the second, in this case zero (0);

You can see it working here: https://www.db-fiddle.com/f/gUbg5HjhdpS819VHfLxDq/1

  • Perfect! Thanks for the help and explanation, really did not know this use of variables (use and manipulation/calculation), I will delve into the subject.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.