Query SQL returning duplicate records

Asked

Viewed 962 times

1

I have two tables (adwords_performance and analytics_transaction) and I need to make a SELECT adding up the columns adwords_performance.clicks, adwords_performance.impressions and analytics_transaction.transaction_revenue.

When I do the SELECT it returns me wrong values in adwords_performance.clicks and adwords_performance.impressions.

SELECT
adwords_performance.date,
Sum(adwords_performance.clicks) AS clicks,
Sum(adwords_performance.impressions) AS impressions,
Sum(analytics_transaction.transaction_revenue) AS revenue
FROM
adwords_performance
LEFT JOIN analytics_transaction 
ON analytics_transaction.date = adwords_performance.date AND
analytics_transaction.adwords_id = adwords_performance.adwords_id AND
analytics_transaction.adwords_campaign_id = adwords_performance.adwords_campaign_id
WHERE
adwords_performance.date = '2015-03-01' AND
adwords_performance.adwords_id = '0123456789'
GROUP BY
adwords_performance.date

Upshot:

+------------+--------+-------------+---------+
| date       | clicks | impressions | revenue |
+------------+--------+-------------+---------+
| 2015-03-01 |    465 |       21017 | 1937.71 |
+------------+--------+-------------+---------+

Result that was to be returned:

+------------+--------+-------------+---------+
| date       | clicks | impressions | revenue |
+------------+--------+-------------+---------+
| 2015-03-01 |    364 |       18577 | 1937.71 |
+------------+--------+-------------+---------+

Tables:

mysql> describe adwords_performance;
+--------------------------------+---------------------------------+------+-----+---------+----------------+
| Field                          | Type                            | Null | Key | Default | Extra          |
+--------------------------------+---------------------------------+------+-----+---------+----------------+
| id                             | int(20) unsigned                | NO   | PRI | NULL    | auto_increment |
| adwords_id                     | bigint(255) unsigned            | NO   |     | NULL    |                |
| adwords_campaign_id            | bigint(255) unsigned            | NO   |     | NULL    |                |
| campaign_network_type          | varchar(255)                    | NO   |     | NULL    |                |
| date                           | date                            | NO   |     | NULL    |                |
| week                           | date                            | NO   |     | NULL    |                |
| week_day                       | varchar(25)                     | NO   |     | NULL    |                |
| clicks                         | int(255)                        | NO   |     | NULL    |                |
| impressions                    | int(255)                        | NO   |     | NULL    |                |
| cost                           | int(255)                        | NO   |     | NULL    |                |
| av_position                    | decimal(2,1)                    | NO   |     | NULL    |                |
| clicks_converted               | int(255)                        | NO   |     | NULL    |                |
| estimated_converted            | int(255)                        | NO   |     | NULL    |                |
| estimated_converted_rate       | decimal(10,2) unsigned zerofill | NO   |     | NULL    |                |
| clicks_converted_assisted      | int(255)                        | NO   |     | NULL    |                |
| impressions_converted_assisted | int(255)                        | NO   |     | NULL    |                |
| lost_budget                    | varchar(255)                    | NO   |     | NULL    |                |
| clicks_invalid                 | int(255)                        | NO   |     | NULL    |                |
+--------------------------------+---------------------------------+------+-----+---------+----------------+


mysql> describe analytics_transaction;
+----------------------+----------------------+------+-----+---------+----------------+
| Field                | Type                 | Null | Key | Default | Extra          |
+----------------------+----------------------+------+-----+---------+----------------+
| id                   | int(15) unsigned     | NO   | PRI | NULL    | auto_increment |
| analytics_id         | bigint(255) unsigned | NO   | MUL | NULL    |                |
| date                 | date                 | NO   |     | NULL    |                |
| time                 | int(2)               | NO   |     | NULL    |                |
| source               | varchar(255)         | NO   |     | NULL    |                |
| medium               | varchar(255)         | NO   |     | NULL    |                |
| city                 | varchar(255)         | NO   |     | NULL    |                |
| adwords_id           | bigint(255) unsigned | NO   |     | NULL    |                |
| adwords_campaign_id  | bigint(255) unsigned | NO   |     | NULL    |                |
| transaction_id       | bigint(255) unsigned | NO   | MUL | NULL    |                |
| transaction_revenue  | double(10,2)         | NO   |     | NULL    |                |
| transaction_shipping | double(10,2)         | NO   |     | NULL    |                |
| transaction_tax      | double(10,2)         | NO   |     | NULL    |                |
| product_quantity     | int(10)              | NO   |     | NULL    |                |
+----------------------+----------------------+------+-----+---------+----------------+
  • I didn’t find anything wrong in your select. What you did to believe that the expected result is what you expected?

  • The problem is in JOIN with the data fields. To have the expected result, the tables on the right (these connected by JOIN) should have a single record for each record of the main table. As in the table on the right you have more than one occurrence for the same date, JOIN by the date field will not work as you expect. You need to find another way to connect these tables other than by the date field.

  • The adwords_performance table should be aggregated in a virtual table or view and this and that unite with the other tables , the aggregated values should be in the grouping.

  • @Caffé I understand, would you have in mind any way to link these tables with the structure I mentioned above tables? I can’t think of any other way. I performed a test by changing the 'date' field to an 'id' and concatenating the two tables with JOIN and still appearing the same error!

  • @Jeffersonsilva I do two separate SELECT ones in the adwords_performance table and the other in analytics_transaction "I do it to really see the select this returning the correct values" and it returns me that value as I quoted above 364 clicks and 18577

  • @Joãogabrielpellissari I don’t have any JOIN suggestions at the moment because I haven’t analyzed their tables much (this takes time). If you can create a minimum checkable example in sqlfiddle for sure we’ll help you solve the problem quickly. http://sqlfiddle.com/

  • @Caffé As you asked, I did a query involving all the tables I use http://sqlfiddle.com/#! 9/48bec/4

  • The problem is that there is more than one record on analytics_transaction for each adwords_performance. I put an answer with the method I found most effective to solve the problem

Show 3 more comments

1 answer

2

The problem is that your table analytics_transaction has more than one record per adwords_performance by multiplying the clicks. You can solve this by making a subquery to get the desired results from this table, delete Join. However, since it is necessary that all fields are in the GROUP BY you must use a select based on the outcome of query previous. The result is as follows::

SELECT x.date,
       SUM(x.clicks) as clicks,
       SUM(x.impressions) as impressions,
       SUM(x.revenue) as revenue
  FROM (SELECT ap.date,
               ap.clicks AS clicks,
               ap.impressions AS impressions,
               (SELECT SUM(at.transaction_revenue)
                 FROM analytics_transaction at
                WHERE at.adwords_id = ap.adwords_id
                  AND at.adwords_campaign_id = ap.adwords_campaign_id
                  AND at.date = at.date) AS revenue
          FROM adwords_performance ap
         WHERE ap.date = '2015-03-01'
           AND ap.adwords_id = '0123456789') x
 GROUP BY x.date

Browser other questions tagged

You are not signed in. Login or sign up in order to post.