Count without returning 0 in Presto

Asked

Viewed 65 times

1

My query is this:

with dates as (
  SELECT CAST(date_column AS DATE) DAY 
    FROM (
      VALUES (
        SEQUENCE(cast('2019-10-29' AS date), current_date, INTERVAL '1' DAY)
      )
    ) AS t1(date_array)
   CROSS JOIN UNNEST(date_array) AS t2(date_column)
)

SELECT p.profile_id, coalesce(ct,0) AS ct, DAY 
  FROM connect_profiles p
  left JOIN (
    SELECT profile_id, COUNT(distinct visit_id) as ct, dates.DAY 
      FROM connect_visits v 
      right join dates on dates.DAY = cast(v.visit_created_at as date)
      where 
        web_site_id in ('10','11') and
        metadata like '%logged%'
      GROUP BY profile_id, dates.DAY
  ) CountQuery ON p.profile_id = CountQuery.profile_id
  where p.profile_id = 733194
  order by DAY asc

I’ve tried everything I found on the internet that could help me return the 0 in the count() when you don’t have any visitors attached to the profile_id on a specific day, but it never sticks. I don’t know what I’m doing wrong. It only shows the days that the profile_id made visit, but I want him to show me every day that are in the given time interval in the consultation, regardless of whether had consultation or not.

I wonder if someone could help me?

The result I get is this one:

 profile_id ct  DAY
1   733194  4   2019-11-04
2   733194  9   2019-11-06
3   733194  6   2019-11-07
4   733194  3   2019-11-09
5   733194  101 2019-11-10
6   733194  38  2019-11-11
7   733194  16  2019-11-12
8   733194  6   2019-11-14
9   733194  3   2019-11-17
10  733194  5   2019-11-18
11  733194  5   2019-11-19
12  733194  3   2019-11-20
13  733194  6   2019-11-21
14  733194  3   2019-11-22
15  733194  1   2019-11-23
16  733194  4   2019-11-24
17  733194  7   2019-11-25
18  733194  5   2019-11-26
19  733194  3   2019-11-27
20  733194  4   2019-11-28
21  733194  4   2019-11-30
22  733194  4   2019-12-01
23  733194  6   2019-12-02
24  733194  6   2019-12-03
25  733194  7   2019-12-05
26  733194  1   2019-12-06
27  733194  4   2019-12-07
28  733194  2   2019-12-08
29  733194  8   2019-12-09
30  733194  5   2019-12-10
31  733194  6   2019-12-11
32  733194  2   2019-12-12
33  733194  1   2019-12-13
34  733194  2   2019-12-14
35  733194  2   2019-12-15
36  733194  2   2019-12-16

I want Count since 10/29/2019. I want 0 to appear if you don’t have a visitor someday.

  • Have you tried using nvl/coalesce?

  • I tried above: SELECT p.profile_id, coalesce(ct,0) AS ct, DAY

1 answer

0

The problem with the aggregation functions (COUNT, SUM, AVG, etc) is that they do not compute null values.

When we have values NULL in the column itself or because of a LEFT JOIN or RIGHT JOIN for example, the bank does not know how to compute these values. How the bank can answer the questions for example:

  • what a null is worth if you’re counting?
  • what a null is worth in a sum?

We need to instruct the bank how to handle this in the query. For this, the value within the COUNT for example can not be a null, so you can use the function coalesce that you already have in your code inside the Count, for example like: COUNT(coalesce(visit_id, 0)).

Since zero is a value, it will be counted. It would be the same result of adding up if the value was 1, like this: SUM(coalesce(visit_id, 1)), ie, is converting and "telling" to the function how to treat nulls.

Remember that in this case, each record will be processed by coalesce to check if it is necessary to replace the null. If your table is too large, this can generate a performance problem. In that case, it may be necessary to "say" to the count count the null separator, something like this:

SELECT COUNT(visit_id) FROM connect_visits WHERE visit_id IS NULL, having to make two queries, or a sub-query, just to illustrate the example and give other options.

Mount here on the http://sqlfiddle.com/ an example with null values and how are the results using these examples above.

  • I can’t use coalesce with distinct in the inside query. It gives error. And when I only coalesce inside and place Count in the top query shows me the same result as before, it changes nothing :( SELECT Count(distinct ct) AS ct, p.profile_id, DAY FROM connect_profiles p left JOIN (SELECT profile_id, coalesce(visit_id,0) as ct, Dates.DAY FROM connect_visits v right Join Dates on Dates.DAY = cast(v.visit_created_at as date) Where web_site_id in ('10','11') and Metadata like '%logged%' ) Countquery

Browser other questions tagged

You are not signed in. Login or sign up in order to post.