Optimization in sql

Asked

Viewed 95 times

1

Considering these two tables in the database:

Product Table:

| id  | nome      |
|-----|-----------|
| aaa | Produto A |
| bbb | Produto B |
| ccc | Produto C |

Attributes Table:

| id_produto | atributo | valor   |
|------------|----------|---------|
| aaa        | cor      | azul    |
| aaa        | tamanho  | M       |
| bbb        | cor      | preto   |
| bbb        | tamanho  | P       |
| ccc        | cor      | amarelo |
| ccc        | tamanho  | G       |

and the following SQL query:

select
    p.nome,
    c.valor,
    t.valor
from
    Produto p,
    Atributos c,
    Atributos t
where
    p.id = c.id_produto and
    p.id = t.id_produto and
    c.atributo = 'cor' and
    t.atributo = 'tamanho'

Is there any way to make this select without duplicating the table attributes?

Edit #1

Upshot:

| nome      | cor     | Tamanho |
|-----------|---------|---------|
| Produto A | azul    | M       |
| Produto B | preto   | P       |
| Produto C | amarelo | G       |

Note: I cannot change the table structure, the real database has several different attribute types (they are generated dynamically) and thousands of records.

  • You’re using letters on primary_key?

  • Is slowing down?

  • All data is received via integration, and yes, it can have letters in the key Primary. the point is not the database structure, but rather how select is done

  • I believe what you want is to transpose the result of the table atributos for each occurrence of produto. Research on pivot

  • How would I get this result using pivot?

  • I posted as an answer based on your example model. Apply there in your real scenario and check if the molhora of performance is reflected. I believe so, because for every 'attribute' you wanted to include in the result you would have to generate a Cartesian product in the table atributos if you follow the current approach.

Show 1 more comment

4 answers

2


It is good you check if it compensates with respect to the performance for your real case, but for the presented example would be the following:

SELECT NOME,
       COR,
       TAMANHO
FROM (
    SELECT 
        P.NOME AS NOME,
        A.ATRIBUTO AS ATRIBUTO ,
        A.VALOR AS VALOR
    FROM PRODUTO P
        JOIN ATRIBUTOS A ON A.ID_PRODUTO = P.ID_PRODUTO ) AS FONTE 
        PIVOT ( MIN(VALOR) FOR ATRIBUTO IN (COR, TAMANHO)) AS PVT

In the case of your example snipet, running the queries in the first two approaches (crossing the table) costs 71% while the second form, 29%, that is to say, about 1 /3 of effort. I believe the same is reflected for you in the real case, but be sure to measure.

0

No, because there are two different lines being searched.
The best solution here would be to redo the tables, avoiding the comparison of strings of attributes (which even with indexed strings, would be slower than the direct attribute declaration in type):

Product Table:

| id  | nome      |
|-----|-----------|
| aaa | Produto A |
| bbb | Produto B |
| ccc | Produto C |

Attributes Table:

| id_produto | tamanho | cor     |
|------------|---------|---------|
| aaa        | M       | azul    |
| bbb        | P       | preto   |
| ccc        | G       | amarelo |
  • I cannot change the structure of the bank, these attributes are generated dynamically, and may not be the same for different products

  • Any other solution will also make the table appear 2x in the query. But the important thing is: What’s the main reason behind this? You can add a EXPLAIN to see if there is a real reason for optimization?

  • The cost of this consultation is at 76364714160, I’m using a more simplified representation here

  • Then you need to remodel your database, either by adding a new index or by asking for the entire remodel. General query optimizations should be done in very specific cases - and as in this case you are looking at the entire BD, there is nothing you can do

0

You can make an attribute with clause by id and house one of the attribute table attributes and then cross-check with the product. Use the performance hint " Materialize" to optimize performance. You can activate parelism as well.

That way you don’t duplicate information. If I have to try to explain it better

  • It would be interesting for you to include the mentioned query in your reply

0

Here’s an example with Pivot, see if it fits your case:

Script Pivot:

SELECT 
    p.nome,
    p.cor,
    p.tamanho

FROM 
    (
    select
        p.nome, 
        a.atributo,
        a.valor
    from #produto p
    inner join #atributos a on p.id = a.id_produto
    group by p.nome, a.atributo, a.valor
    ) as x
 PIVOT (max(valor) 
FOR atributo IN ([cor],[tamanho]))P
ORDER BY 1;

Upshot

**nome | cor | tamanho**

Produto A | azul |  M

Produto B | preto | P

Produto C | amarelo |   G

Script for database creation (SQL Server):

create table #produto(
    id varchar(50) primary key,
    nome varchar(50)
);

create table #atributos(
    id_produto varchar(50) references #produto(id),
    atributo varchar(50) not null,
    valor varchar(50) not null
);

insert into #produto values
('aaa', 'Produto A'),
('bbb', 'Produto B'),
('ccc', 'Produto C')

insert into #atributos values
('aaa', 'cor', 'azul'),
('aaa', 'tamanho', 'M'),
('bbb', 'cor', 'preto'),
('bbb', 'tamanho', 'P'),
('ccc', 'cor', 'amarelo'),
('ccc', 'tamanho', 'G')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.