How to not add zeroed values - SQL

Asked

Viewed 188 times

-1

I have several columns and some of them are practically zeroed, like every 100 rows have a value. I’m adding these columns with SUM(column1), SUM(column2)... but when adding up these columns I don’t want you to take these values zeroed, so I left it that way:

Select 
    SUM(coluna1),
    SUM(coluna2),
    [...]
From tabela
Where 
    ano_mes = 202004 
    And ( coluna1 != 0
    Or  coluna2 != 0
    Or coluna3 != 0 
    Or coluna4 != 0 
    Or coluna5 != 0 
    Or coluna6 != 0 
    Or coluna7 != 0 )

I’ve also done with Not In('0') for each column, but the same way is appearing the zeroed values, which I believe is adding too, taking more time of the query, how can I improve this?

NOTE: I gave an example without containing the names of the tables and columns, because I cannot. It is also accompanied by other columns that are not adding up, only to identify the information of the values.

  • 3

    But zero is not the neutral element in addition? It will never interfere with the result.

  • You with the or is taking out lines that have at least one of the columns without being zero. There like, if in a column row 1 is a non-zero value and all other columns are?

  • @Jorgemendes Let’s assume: Row 1 Column 1 = 2 / Column 2 = 0 / Column 3 = 0 / Column 4 = 1 Row 1 Column 1 = 2 / Column 2 = 0 / Column 3 = 0 / Column 4 = 1 In the result I want it to bring all the columns with the information and the sum of column 1 (4) and Column 4(2). But each row changes the values of each column, today it takes 13mins to query perform, need to take at least 7mins.

  • If you want to "take at least 7mins" and "today it takes 13mins" then it fully meets your requirements.

  • This seems to me to be an xy problem. https://pt.meta.stackoverflow.com/questions/499/o-que%C3%A9-o-problema-xy

  • @anonimo https://answall.com/questions/456269/melhorar-perfomece-da-procedure-sql?noredirect=1#comment869873_456269 this is my problem

  • Look, have I seen your other link and you really need to do the analysis in so many groups or are you using group by just to keep the columns? That if you are only doing group_by so it may be simpler to create the sums in another table and join the two with a Join.

  • As @anonimo spoke about xy problem, I find it useful to try to look for answers on how to get to the final format you want and not about your specific method. Can you do an Edit by commenting on it? Making it clear how you want the result to be?

  • @Tatianelucia Can you add information about the table? For example, what it stores, what is the primary key of the index clustered, as is the process of adding lines in the table, what is the approximate number of lines etc? Add this information there in the text of your question.

Show 4 more comments

1 answer

1


OPTION 1

The performance of your query can be improved if there is index with the ano_mes column as the primary key and containing the columns column1 to column7 and also the other identification columns. Something like that:

-- código #2 v2
CREATE nonclustered INDEX I2_tabela on tabela (ano_mes) include (coluna1, ..., coluna7, colunas_identificação);

However, it is necessary to be careful with the creation of random indexes, especially when the list of columns of the INCLUDE clause is extensive. It is that indexes increase the workload of SQL Server and also occupy physical space. It is a win-lose.


OPTION 2

Another option is to add column calculated in the table

-- código #3
ALTER TABLE tabela
    add temValor as cast ( (sign(coluna1) + sign(coluna2) + sign(coluna3) + sign(coluna4) + sign(coluna5) + sign(coluna6) + sign(coluna7)) as tinyint) persisted;

and after creating the following index:

-- código #4 v2
CREATE nonclustered INDEX I2_tabela on tabela (ano_mes, temValor) include (coluna1, ..., coluna7, colunas_identificação);

In the consultations it is enough to use the following predicate in the clause WHERE

-- código #5 v2
SELECT colunas_identificação, ...
  from ...
  where ano_mes = 202004 
        and temValor > 0
  group by colunas_identificação;

In this option the reading will also be efficient and has the advantage of making the SQL query simpler in terms of writing. However, it is necessary to be careful with the creation of random indexes, especially when the list of columns of the INCLUDE clause is extensive. It is that indexes increase the workload of SQL Server and also occupy physical space. It is a win-lose.

This option considers that the values of column 1 to column 7 are always 0 or positive value.


OPTION 3

The two options above are simple to implement but may have high cost in physical space. As you quote "every 100 lines has a value", then the use of filtered index can be a way to reduce the physical space.

The filtered index creation code could be something like

-- código #6
CREATE nonclustered INDEX I2_tabela on tabela (ano_mes) include (coluna1, ..., coluna7)
     where (T.coluna1 <> 0 or ... or T.coluna7 <> 0);

However, it is not possible to use the OR operator in the WHERE clause for the creation of a filtered index. It is also not allowed to use calculated column. But there is an outline solution, which is to add the temValue column to the table and use trigger procedure to update the contents of that column.

-- código #7 v2
ALTER TABLE tabela ADD temValor tinyint default null;
go

CREATE TRIGGER ajusta_temValor on tabela
after INSERT, UPDATE as
UPDATE T
  set temvalor= case when (T.coluna1 <> 0 or ... or T.coluna7 <> 0) then 1 else 0 end
  from INSERTED as I
       inner join tabela as T on T.chave = I.chave;
go        

CREATE nonclustered INDEX I2_tabela on tabela (ano_mes) include (coluna1, ...coluna7, colunas_identificação)
     where temValor > 0;
go

Ready!

The SQL query looks like this:

-- código #8 v2
SELECT colunas_identificação, sum(coluna1), ..., sum(coluna7)
  from tabela
  where ano_mes = 202004 and temValor > 0
  group by colunas_identificação;

OPTION 4

It is possible to further reduce the physical space occupied by the filtered index by not including column1, ... column7.

-- código #9
ALTER TABLE tabela ADD temValor tinyint default null;
go

CREATE TRIGGER ajusta_temValor on tabela
after INSERT, UPDATE as
UPDATE T
  set temvalor= case when (T.coluna1 <> 0 or ... or T.coluna7 <> 0) then 1 else 0 end
  from INSERTED as I
       inner join tabela as T on T.chave = I.chave;
go        

CREATE nonclustered INDEX I2_tabela on tabela (ano_mes) 
     where temValor > 0;
go

In this case the consultation shall be constructed as follows::

-- código #10
SELECT colunas_identificação, sum(coluna1), ..., sum(coluna7)
  from tabela with (index = I2_tabela)
  where ano_mes = 202004 and temValor > 0
  group by colunas_identificação;

To the review the implementation plan code #10 perceives the presence of the operator "Key Search". That is, you win on one side (smaller physical space occupied by the index) but lose on the other side (additional access to the table to obtain the contents of the other columns).


REMARKS

Options 1 and 2 immediately solve the reported problem but have one negative factor: the physical space occupied by the new index. In another topic you mention that there are about 50 million rows and that there are 21 columns to be treated. That is, it will be an index that will occupy a lot of physical space, maybe next to the table. I do not think good options.

Option 3 may decrease the physical space occupied by the index, if it really is a sparse matrix. To evaluate whether it pays to implement option 3 one should rather evaluate how many lines will be indexed.

Option 4 should also be evaluated before it is implemented, both as to the physical space occupied by the new index and (and mainly) to the increase of readings in the table, because of the operator "Key search".

Another option is to evaluate whether it compensates for partitioning the table by the ano_mes column. It can be an interesting solution, because it would allow you to directly access the block of line referring to the desired year/month.

The possibility of changing the primary key of the index clustered for the ano_mes column and the impacts of that change.

The possibility of using a type index columnstore filtered should be evaluated if the SQL Server version is 2016 or newer.

Finalizing, to propose efficient solutions it is necessary to have information on the context. What has been exposed on the case is vague.


Talk to who gave you the task about the options available.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.