Return all equal items from different groups

Asked

Viewed 850 times

20

-------------------
- TABLE           -
-------------------
ID | GRUPO | OBJETO
---|-------|-------
1  | 1     | 1
2  | 1     | 2
   |       |
3  | 2     | 1
4  | 2     | 2
   |       |
5  | 3     | 1
6  | 3     | 2
7  | 3     | 3
   |       |
5  | 4     | 1
6  | 4     | 3
   |       |
7  | 5     | 1

I need to make a query to return all the groups it contains exactly the same objects. I’ve used subquery with IN but the objects 1 and 2 are also in the group 3.

For example, if I consult using where GRUPO = 1, I need you to return only a total of two records (group 1 and group 2).

The objects of group 1 (1 and 2), are also in group 2. But group 3 contains one object in addition, group 4 contains two objects but one is different from group 1, and group 5 contains only one object; therefore, groups 3, 4 and 5 should not be considered.

There are some relationships but they don’t make much difference in the case.


  • Group 1 bought: orange and apple
  • Group 2 bought: orange and apple
  • Group 3 bought: orange, apple and banana

If I consult by GRUPO = 1, need I return ONLY AND ONLY who else bought ONLY orange and apple. I don’t care which groups orange and apple are, so IN doesn’t work.

4 answers

23


I think the following solution returns the result you want.

This query returns all groups that have exactly the same elements as group 1.

SELECT T.GRUPO
FROM   TESTE T
LEFT JOIN 
(
  SELECT DISTINCT T1.OBJECTO,
         (SELECT COUNT(DISTINCT T2.OBJECTO) 
            FROM TESTE T2
           WHERE T2.GRUPO = T1.GRUPO) TOTAL
  FROM TESTE T1
  WHERE T1.GRUPO = 1   --filtras aqui o grupo
) SS
  ON SS.OBJECTO = T.OBJECTO
GROUP BY T.GRUPO
HAVING COUNT(DISTINCT T.OBJECTO) = COUNT(DISTINCT CASE WHEN SS.OBJECTO IS NOT NULL THEN SS.OBJECTO END)
   AND COUNT(DISTINCT T.OBJECTO) = MAX(TOTAL)

The logic behind this query is the following:

The subquery

SELECT DISTINCT T1.OBJECTO,
       (SELECT COUNT(DISTINCT T2.OBJECTO) 
          FROM TESTE T2
         WHERE T2.GRUPO = T1.GRUPO) TOTAL
FROM TESTE T1
WHERE T1.GRUPO = 1

aims to return all objects in group 1. This is the basis for the process. Initially, this instruction only returned objects, it was necessary to change to return also the total number of objects.

With this result we have everything necessary to look for the other groups that have exactly the same elements. This is done through the following instructions:

HAVING COUNT(DISTINCT T.OBJECTO) = COUNT(DISTINCT CASE WHEN SS.OBJECTO IS NOT NULL THEN SS.OBJECTO END)
AND COUNT(DISTINCT T.OBJECTO) = MAX(TOTAL)

The first means, "all members of the group (COUNT(DISTINCT T.OBJECT)) have to be in group 1 (COUNT(DISTINCT CASE WHEN SS.OBJECT IS NOT NULL THEN SS.OBJECT END))"

Finally, the instruction COUNT(DISTINCT OBJECT) = MAX(TOTAL) guarante that the group has to possess the same number of elements as group 1. This instruction is fundamental to exclude groups that, as in your example, have only orange. Orange is in Group 1 but is fantando the apple.

MAX is required because in the HAVING instruction we can only use aggregation functions or constants.

If you want to get the objects for each of the groups (the objects will always be the same), just use the previous query to filter the results, for example like this

SELECT T.GRUPO,
       T.OBJECTO
FROM   TESTE T
INNER JOIN
(
    SELECT T.GRUPO
    FROM   TESTE T
    LEFT JOIN 
    (
      SELECT DISTINCT T1.OBJECTO,
            (SELECT COUNT(DISTINCT T2.OBJECTO) 
               FROM TESTE T2
              WHERE T2.GRUPO = T1.GRUPO) TOTAL
      FROM TESTE T1
     WHERE T1.GRUPO = 1    --filtras aqui o grupo
    ) SS
      ON SS.OBJECTO = T.OBJECTO
    GROUP BY T.GRUPO
    HAVING COUNT(DISTINCT T.OBJECTO) = COUNT(DISTINCT CASE WHEN SS.OBJECTO IS NOT NULL THEN SS.OBJECTO END)
       AND COUNT(DISTINCT T.OBJECTO) = MAX(TOTAL)
) X
  ON T.GRUPO = X.GRUPO
ORDER BY 1, 2

The Sqlfiddle

  • It worked wonderfully well. I could give a slight comment on the low side, just to not be completely lost?

  • 2

    Of course, I did. I edited the answer to include an explanation. If you need any more details warns.

  • Just about the use of HAVING, COUNT e CASE... Makes a combination with the total of object with the objects distinct from group 1, that’s it?

  • 1

    Yes, using having is to ensure that all objects in the group are in group 1. I edited the answer, let us know if you need any extra details.

  • 1

    Show, I understood the application better now. I had already turned the SOEN but I found nothing that could help. Your answer was straight to the point.

  • 1

    I’m just waiting for the time to offer reward to mark your response. I opened a META explaining.

Show 1 more comment

5

Solution:

select A.GRUPO
from TABELA A join TABELA B on B.OBJETO = A.OBJETO
where B.GRUPO = 1
group by A.GRUPO
having count(A.OBJETO) = (select count(OBJETO) from TABELA where GRUPO = 1)
and count(A.OBJETO) = (select count(OBJETO) from TABELA where GRUPO = A.GRUPO)

This query works and seems to me quite simple.

Explaining the line of reasoning:

With the first part of the query (from select up to the Where) i return the records that contain at least one of the group 1 objects.

For better visualization, see this first part of the slightly modified query:

select A.ID, A.GRUPO, A.OBJETO
from TABELA A join TABELA B on B.OBJETO = A.OBJETO
where B.GRUPO = 1

The result of this query would be:

-------------------
ID | GRUPO | OBJETO
---|-------|-------
1  |  1    |    1
2  |  1    |    2
   |       |
3  |  2    |    1
4  |  2    |    2
   |       |
5  |  3    |    1
6  |  3    |    2
-7 |  3    |    3
   |       |
8  |  4    |    1
-9 |  4    |    3
   |       |
10 |  5    |    1

The marked records (-) were left out of the result, since its object does not correspond to any of the group 1 objects.

So see that so far I return only the groups that have some object of group 1, and also their respective object.

Let’s now visualize the grouping of this:

select A.GRUPO, count(A.OBJETO)
from TABELA A join TABELA B on B.OBJETO = A.OBJETO
where B.GRUPO = 1
group by A.GRUPO

Upshot:

---------------------
GRUPO | COUNT(OBJETO)
------|--------------
  1   |     2
  2   |     2
  3   |     2
  4   |     1
  5   |     1

Now I will filter to keep only the groups whose totalization of objects is equal to the total of objects in group 1, adding the clause having:

select A.GRUPO, count(A.OBJETO)
from TABELA A join TABELA B on B.OBJETO = A.OBJETO
where B.GRUPO = 1
group by A.GRUPO
-- novo:
having count(A.OBJETO) = (select count(OBJETO) from TABELA where GRUPO = 1) 

Upshot

---------------------
GRUPO | COUNT(OBJETO)
------|--------------
  1   |     2
  2   |     2
  3   |     2

Group 3 has to leave because although the Count of it has stayed at 2 (because it actually has two objects matching the group 1 objects), in fact it has 3 objects at the base (one of them does not coincide with any group 1 object and so was outside the Count).

Then I add one more condition in the having, determining that only groups whose total number of objects in the base is equal to the total of objects in group 1:

select A.GRUPO, count(A.OBJETO)
from TABELA A join TABELA B on B.OBJETO = A.OBJETO
where B.GRUPO = 1
group by A.GRUPO
having count(A.OBJETO) = (select count(OBJETO) from TABELA where GRUPO = 1)
-- novo:
and count(A.OBJETO) = (select count(OBJETO) from TABELA where GRUPO = A.GRUPO) 

Finally, I remove the Count of select because I do not wish it in the visualization of the result, and I arrive at the query that was presented at the beginning of the answer.

See on SQL Fiddle.

  • 1

    It passed very close, but the problem is not the equivalent quantity, because, for example, group 1 has object (1 and 2) and group 2 has object (1 and 3)... Both have the object 1 in common and the same amount equal, but does not have the same objects.

  • 1

    @Papacharlie I get it. Solved.

  • 1

    @Papacharlie What’s up? Works like a Charm :D

  • 1

    I’m still testing and adapting...

3

The only way I know how to do it is with GROUP_CONCAT, and as it is not possible to do Join with calculated fields will be required two Subqueries (can even create an view):

SELECT t2.*
FROM (
    SELECT grupo
    , GROUP_CONCAT(objeto ORDER BY objeto) objetos
    FROM tabela GROUP BY grupo
) t1 INNER JOIN (
    SELECT grupo
    , GROUP_CONCAT(objeto ORDER BY objeto) objetos
    FROM tabela GROUP BY grupo
) t2 on t1.objetos = t2.objetos
WHERE t1.grupo = 1

I made a SQL Fiddle demonstrating the operation with group 3, which has 3 objects.

  • 1

    But in that case, t1.objetos = t2.objetos would not return if it were the comparison of 3 objects - or am I mistaken? And GROUP_CONCAT has the constraint of the limit, that would have to manipulate...

  • 1

    "Regardless" of the amount, it will work with 3 or more, because of the order by inside grupo_concat, but it really is subject to the limit

  • 1

    It didn’t work, he told group 3 too.

-6

Follows the SQL

SELECT count(distinct grupo,objecto) as tabela FROM tabela WHERE objecto in (1,2) and grupo IN (1,2) group by objecto.

Make sure that’s what you want. Next time try to be clearer.

  • 5

    Take a test and find that in does not return records that contains exactly the same objects!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.