SELECT condition WHERE with multiple values in the same column

Asked

Viewed 35,616 times

12

I have two tables in my bank and perform JOIN with them. A table is a list of people and the other characteristic list of that person. By logic there is only one person and each person can have several characteristics, so two tables. So when I make one JOIN appears the same person a few times only with its feature on the side. EX:

+-----+---------+----------------+
|  ID |  PESSOA | CARACTERÍSTICA |
+-----+---------+----------------+
|  1  | Carlos  | Alto           |
+-----+---------+----------------+
|  1  | Carlos  | Divertido      |
+-----+---------+----------------+
|  1  | Carlos  | Inteligente    |
+-----+---------+----------------+
|  2  | Iago    | Baixo          |
+-----+---------+----------------+
|  2  | Iago    | Divertido      |
+-----+---------+----------------+
|  2  | Iago    | Esperto        |
+-----+---------+----------------+
|  3  | Artur   | Divertido      |
+-----+---------+----------------+
|  3  | Artur   | Inteligente    |
+-----+---------+----------------+

If I do this SELECT I get an empty result:

SELECT 
    p.*, c.* 
FROM 
    pessoas AS p LEFT JOIN perfil AS c ON p.pid = c.perfil_pessoa 
WHERE 
    c.caracteristica = 'Divertido' AND c.caracteristica = 'Inteligente'

When in fact I would like the following result:

+-----+---------+----------------+
|  ID |  PESSOA | CARACTERÍSTICA |
+-----+---------+----------------+
|  1  | Carlos  | Alto           |
+-----+---------+----------------+
|  1  | Carlos  | Divertido      |
+-----+---------+----------------+
|  1  | Carlos  | Inteligente    |
+-----+---------+----------------+
|  3  | Artur   | Divertido      |
+-----+---------+----------------+
|  3  | Artur   | Inteligente    |
+-----+---------+----------------+

That is, to have as a result, every person who is 'Divertida' and 'Inteligente'.

If in the clause WHERE of SELECT me use IN or OR the result is also not what I hope, by the example given, the SELECT return all results, because all people have as characteristic "Fun":

+-----+---------+----------------+
|  ID |  PESSOA | CARACTERÍSTICA |
+-----+---------+----------------+
|  1  | Carlos  | Alto           |
+-----+---------+----------------+
|  1  | Carlos  | Divertido      |
+-----+---------+----------------+
|  1  | Carlos  | Inteligente    |
+-----+---------+----------------+
|  2  | Iago    | Baixo          |
+-----+---------+----------------+
|  2  | Iago    | Divertido      |
+-----+---------+----------------+
|  2  | Iago    | Esperto        |
+-----+---------+----------------+
|  3  | Artur   | Divertido      |
+-----+---------+----------------+
|  3  | Artur   | Inteligente    |
+-----+---------+----------------+

The purpose is to build filters using people’s characteristics.

  • 1

    The absence of the profile table model makes the response difficult.

  • 1

    It does not directly answer the question, but why not declare characteristic as a SETin the table persons? More information on: https://dev.mysql.com/doc/refman/8.0/en/set.html

5 answers

14


You want to bring the person and all its characteristics if they meet the filter (which will be cumulative).

If you are using the version 8 from Mysql, This is gonna be like a walk in the park.

According to the documentation, this version supports Commom Table Expressions.

Free translation with own highlights

One common table Expression (CTE) is a resultset nominee temporary that only exists within the scope of a single sql statement, therefore can be referenced only in it.

The applications and variations of Ctes are numerous. Your case is certainly one of them.

Note - Tables and column names have been simplified in the following examples

See how we could do:

WITH candidatos AS 
 ( 
     SELECT DISTINCT c.id_pessoa 
     FROM caracteristica c
     WHERE c.caracteristica in ('Alto','Magro') // Insere aqui a lista de qualidades procuradas
     GROUP BY c.id_pessoa
     HAVING count(Distinct c.caracteristica) >= 2 // Aqui a quantidade de qualidades filtradas
 )
SELECT p.nome, c.caracteristica
FROM candidatos filtro 
    JOIN pessoa p ON o.id = filtro.id_pessoa
    JOIN característica c ON c.id_pessoa = p.id

Unfortunately I couldn’t find a Mysql 8 fiddle online to enrich the answer with a verifiable example

It seems a consultation clean enough for me and I really hope you can alplicate such a solution in your scenario for the good of all and the nation’s general happiness.

On the other hand...

If you are using a Mysql version < 8.0

If you are using previous versions, I was able to reach some alternatives that might solve your problem, but continue in search of better alternatives is recommended.

Option 1 - Using some nested selects:

SELECT P.nome, C.caracteristica
FROM PESSOA P
    JOIN CARACTERISTICA C on C.id_pessoa = P.id
WHERE Exists ( SELECT 1
               FROM CARACTERISTICA C2
                    RIGHT JOIN ( SELECT distinct cr.caracteristica
                                 FROM caracteristica cr
                                 WHERE Cr.CARACTERISTICA in ('Alto', 'Magro') ) c1 ON c2.caracteristica = c1.caracteristica
              WHERE c2.id_pessoa = p.id 
              GROUP BY C2.id_pessoa
              HAVING count(c2.caracteristica) >= ( SELECT Count(distinct cr.caracteristica)
                                                   FROM caracteristica cr
                                                   WHERE Cr.CARACTERISTICA in ('Alto', 'Magro') ) )

* There’s still a problem here in the subselects I use a Distinct in the characteristics to obtain a subset with the values Alto and Magro. If you included as a filter a feature that no one had (Feio, for example), this condition would simply be 'ignored' in the result. I think this would not be the expected/ideal behavior.

* This example is available in SQL Fiddle

Option 2 - -Using temporary table:

Another way would be to use a temporary table and insert into it the desired characteristics. The consultation would be less ugly (but still far from ideal), so:

SELECT P.nome, C.caracteristica
FROM PESSOA P
  JOIN CARACTERISTICA C on C.id_pessoa = P.id
  JOIN ( SELECT c2.id_pessoa, count(c2.caracteristica)
         FROM filtro f
             LEFT JOIN caracteristica c2 on c2.caracteristica = f.caracteristica
         GROUP BY c2.id_pessoa
         HAVING COUNT(c2.caracteristica) = (SELECT COUNT(*) FROM filtro) ) pf on pf.id_pessoa = p.id

* This example is also available in SQL Fiddle

About the results you had in your attempt

When you declare the way you had done (No Where of the consultation itself c.caracteristica = 'Divertido' AND c.caracteristica = 'Inteligente') you were getting the empty result because each feature had to be equal to 'Smart'' And equal to 'Fun', which is conceptually impossible, because the feature will always be one or another.

Using IN or OR (also in the Where of the consultation itself, as was suggested) you would end up bringing only the characteristics filtered and any person who had at least one of them.

I hope I’ve helped.

  • 1

    you understood perfectly what I want to do, only I made a test here following your example, and yet it returned all the results that has one feature or another, not necessarily the result that has both selected characteristics.

  • Now it worked! Thank you very much, I will apply to the project.

  • 'seek alternatives is well recommended', would you have some to tell me? I researched this type of filter on the internet and found nothing that could guide me to an ideal scenario.

  • 1

    I believe that there is no resource already implemented for this problem of Relational Division, the solutions will most likely be variations of what I have already offered, but perhaps more efficient. But I will keep this in mind if I find a canonical solution I come back here to share it. The community itself here can offer it in a while.

  • Ah! This article discusses the problem and presents some implementations. It is worth giving a check!

  • 1

    I used the first solution, it really worked perfectly. In fact the command was very complex, but it works. Thank you.

  • 1

    Diego, can you give me one more help? I put more information in the chat with the new obstacle I found

  • @ivanveloso do not know the fullsearch mechanism but in the documentation I saw that there are some restrictions, like when they quote 'words that are present in 50% or more of the Rows are considered common and do not match'. Maybe that’s the case with you use it in boolean mode (MATCH (p.product_name) AGAINST ('idem' IN BOOLEAN MODE)), that has no restriction. I can’t say for sure.

Show 3 more comments

10

It has ways and ways of doing, which the best, depends on how the darlings.

For your case I would probably prefer to have a table of characteristics, one of people, and a third to make relationships. Regardless of this, there are some alternatives to your current modeling:

Using JOIN:

This is a relatively simple and objective way - to make a JOIN for each feature:

SELECT 
    a.caracteristica,
    b.caracteristica,
    p.pessoa
FROM 
    perfil AS a
    LEFT JOIN perfil AS b ON a.perfil_pessoa = b.perfil_pessoa
    LEFT JOIN pessoa AS p ON a.perfil_pessoa = p.pid
WHERE 
    a.caracteristica = 'Divertido' AND
    b.caracteristica = 'Inteligente'
;

See working on SQL Fiddle.

Although you need a number of JOINs equivalent to the number of characteristics, the use of LEFT already allows an efficient Planner to cascade away items that do not meet any of the characteristics.


Using SUM, IF and HAVING to filter the item count:

The "secret" here is to compare the SUM with the number of items searched for (kind of inspired by the solution of the colleague @Bruno):

SELECT
  GROUP_CONCAT(caracteristica ORDER BY caracteristica) AS caracteristicas,
  pessoa
FROM
  pessoa
  JOIN perfil ON perfil_pessoa = pid
GROUP BY
  pid
HAVING
  SUM(IF(caracteristica IN ('Divertido','Inteligente'),1,0))=2
;

See working on SQL Fiddle.

But this option has a danger: If there are two 'Fun' entries for the same person, it will not work. Be careful to ensure uniqueness of options in this case (or use a DISTINCT). See the advanced solution below:


"Advanced" version of HAVING:

This version uses the same technique as the previous one, but with bit mathematics, ensuring that all terms are found independent of position, listing order in the search or number of occurrences.

Using the FIND_IN_SET we begin to detect in which position of the list is the term searched, setting a bit using the aggregation function BIT_AND:

SELECT
  GROUP_CONCAT(caracteristica ORDER BY caracteristica) AS caracteristicas,
  pessoa
FROM
  pessoa
  JOIN perfil ON perfil_pessoa = pid
GROUP BY
  pid
HAVING
  BIT_OR(POW(2,FIND_IN_SET(caracteristica, 'Divertido,Inteligente')))&6=6
;

See working on SQL Fiddle.

The most complex part is to determine the &6=6 at the end of the query. The FIND_IN_SET returns values in the range of 0 to the number of SET items passed in string. Zero means "not found".

To set the bits we use the POW(2,...). That is, if nothing is found, the bit will be set 0b00000001. If the first term is found, the bit will be set 0b00000010, and so on.

We happen to want the first and the second terms, which gives the binary value 0b00000110, which is precisely 6. We have to ignore the first bit, as it will be set if a term other than the searched one is returned (and the first bit will not be set if the query find the exact search terms).

Thus, making &6, We get the value 6 with or without the last bit set. Remember: word not found is bit zero (value 1). Term 1 is bit 2 (2 in decimal). Term 2 is bit 3 (4 to decimal). Therefore 2 + 4 = 6.

What if it were 3 terms? Same logic, 2 + 4 + 8 = 14, then the expression of HAVING would be &14=14. If it were 4 terms? 2 + 4 + 8 + 16 = 30, leaving &30=30 in the HAVING.

Do I have to calculate "in hand"? No, just use the following formula:

bitfield = ( 2 ^ num_termos - 1 ) * 2

This can be useful in the client language to generate the query.


Using GROUP_CONCAT:

This query is versatile when generating filters in the sense of not needing to JOINs extra (however, in practice, have several JOINs should not be a real problem in this context):

SELECT
  pessoa, caracteristicas
FROM
  (
    SELECT
      GROUP_CONCAT(caracteristica) AS caracteristicas,
      pessoa
    FROM
      pessoa JOIN perfil ON perfil_pessoa = pid
    GROUP BY
      pid
  ) c
WHERE
  CONCAT(',',caracteristicas,',') LIKE "%,Divertido,%" AND
  CONCAT(',',caracteristicas,',') LIKE "%,Inteligente,%"
;

Realize that the query would be much simpler without the CONCAT(',', but it is a good habit when you seek multiple strings, to prevent a LIKE "%amigo%" take a amigo-da-onça by mistake.

The "advantage" here is that you only touch the WHERE, regardless of the number of features.

The downside is that the search is done with LIKE and does not use indexes.

See working on SQL FIDDLE.

  • 1

    +1 I mainly like the first alternative

  • 1

    @Thank you!. Indeed, the 1st is technically more elegant. It would be more readable if I switched to RIGHT JOIN (the joins would be the conditions, and the "person" the first SELECT member), but I ended up leaving more similar to what I want the query Planner do. I like options to show various useful features for other situations.

  • @Bruno appealed, see HAVING with binary :D

  • Excellent! I believe that all these options can be played on other sgbd too, the cards are on the table =)

  • @Diegorafaelsouza later improves. It has some functions that are not universal, but have a substitute. For example, IF turns IIF on some Dbms (I believe that in T-SQL it is IIF, for example)

  • Yes, when I said that resources can be played translatable, convertible or at least have a equivalent resource in others. In the case of CTE I proposed many banks do not have.

  • @Diegorafaelsouza I don’t know if everyone has this binary aggregation part of option 3, in fact. The rest I believe is very portable indeed

  • 1

    @Diegorafaelsouza grateful for the encouragement, in me occurring more interesting alternatives, complement the answer.

Show 3 more comments

4

This is a (des)known problem called "Relational Division" (I’m trying to find sources in Portuguese but it’s complicated) in which the goal is to find records that meet a set of criteria. There are already some similar questions here at Sopt, for example

In this particular case, it is intended to identify persons who possess (among others) the attributes "Slim" and "Tall".

Based on the response of Diego Rafael Souza, here is a version with a slightly different behavior.

SELECT C.ID_PESSOA, 
       P.NOME,
       C.CARACTERISTICA
  FROM PESSOA P
 INNER JOIN CARACTERISTICA C
    ON C.ID_PESSOA = P.ID
 WHERE C.ID_PESSOA IN
 (
    SELECT ID_PESSOA
      FROM CARACTERISTICA C
     WHERE CARACTERISTICA IN ('Magro', 'Alto')
     GROUP BY ID_PESSOA
    HAVING COUNT(DISTINCT CARACTERISTICA) = 2
 );

In this case note that, unlike the first two alternatives proposed, only people will be returned who are both "Slim" and "Tall". The main difference is in this instruction:

HAVING count(c2.caracteristica) >= ( SELECT Count(distinct cr.caracteristica)
                                       FROM caracteristica cr
                                      WHERE Cr.CARACTERISTICA in ('Alto', 'Magro') ) )

If in your database there is no person who has the attribute "Slim", will be included in the result set all people who are "Tall" and "Fat", for example.

You can confirm the following fiddle.

In your case, you really need to force the number of features to be 2, in the form HAVING COUNT(DISTINCT CARACTERISTICA) = 2 or, as Diego did in option three, using a temporary table and counting the number of existing records.

If you need a more generic solution, use the temporary table or variable table is a good idea and at the same time should have a good performance:

CREATE TABLE FILTRO(f varchar(30));

INSERT INTO FILTRO(f) VALUES('Alto'),('Magro');

SELECT C.ID_PESSOA, 
       P.NOME,
       C.CARACTERISTICA
  FROM PESSOAS P
 INNER JOIN perfil C
    ON C.ID_PESSOA = P.ID
 WHERE C.ID_PESSOA IN
 (
    SELECT ID_PESSOA
      FROM PERFIL 
     WHERE CARACTERISTICA IN ('Magro', 'Alto')
     GROUP BY ID_PESSOA
    HAVING COUNT(DISTINCT CARACTERISTICA) = (SELECT COUNT(1) FROM FILTRO)
 );
  • Great reference to the question here of Sopt, I had looked before and had not found.

1

I am posting on this date, only to help those who need a more simplified consultation to meet the same situation.

Single table = person

SQL=

   select  P.nome, P.caracteristica
    from 
       pessoa P
      ,(SELECT id from pessoa where caracteristica='Alto') as C1
      ,(SELECT id from pessoa where caracteristica='Inteligente') as C2
    where 
      P.id=C1.id 
      AND P.id=C2.id
      

Each feature has to be created a new subconsulta Example: ,(SELECT id from pessoa where caracteristica='Divertido') as C3 and in cloister where include AND P.id=C3.id

NEW SQL=

   select  P.nome, P.caracteristica
    from 
       pessoa P
      ,(SELECT id from pessoa where caracteristica='Alto') as C1
      ,(SELECT id from pessoa where caracteristica='Inteligente') as C2
      ,(SELECT id from pessoa where caracteristica='Divertido') as C3
    where 
      P.id=C1.id 
      AND P.id=C2.id
      AND P.id=C3.id

1

  • The absence of the profile table model makes the answer difficult, but:

FROM people AS p LEFT JOIN profile AS c ON p.pid = c.perfil_person

  • One of the bugs is just above, c.perfil_pessoa has to be c.id for the id of the p.id to receive the profile id.

see:

FROM people p left Join caracteristica c on c.id = p.id

Where Voce can use IN

FROM people p left Join caracteristica c on c.id = p.id WHERE c.feature IN ('Fun', 'Smart')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.