How to delete duplicated lines sequentially

Asked

Viewed 569 times

2

I need help deleting the line that is duplicated sequentially, but not the duplicate amount of the same record in the table, example lines 6 and 7, are status repeated sequentially, so I would have to delete line 7, keeping lines 6 and 9 intact. Below is an example.

declare @Pedidos as table (Cliente int, Status int, DataHora datetime )

Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,0 ,'03-11-2017 18:10:56:500')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,2 ,'03-11-2017 18:10:57:410')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,1 ,'03-11-2017 18:10:54:923')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,3 ,'03-11-2017 18:26:50:513')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,4 ,'03-11-2017 18:27:06:143')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:00:01:523')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,18,'03-11-2017 20:10:05:563')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 19:55:26:983')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:15:06:563')

select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos order by DataHora asc

Upshot

linha|cliente|status|datahora
1    |565510 |1     |2017-11-03 18:10:54.923
2    |565510 |0     |2017-11-03 18:10:56.500
3    |565510 |2     |2017-11-03 18:10:57.410
4    |565510 |3     |2017-11-03 18:26:50.513
5    |565510 |4     |2017-11-03 18:27:06.143
6    |565510 |6     |2017-11-03 19:55:26.983
7    |565510 |6     |2017-11-03 20:00:01.523
8    |565510 |18    |2017-11-03 20:10:05.563
9    |565510 |6     |2017-11-03 20:15:06.563
  • What matters then is whether the cliente and the status are equal in sequence?

  • Let’s say your line 9 was at position 8, as you would decide which one would be deleted?

  • 1

    Your question is very strange and confused. I believe it should be reformulated.

  • it is not the same, but it may help you: https://answall.com/questions/166042/comordeletar-todos-duplicates

  • Voce can do the validation at the time of the Insert making Voce not have this problem , by making the Voce Insert valid from the last position is equal to what Voce is trying to insert

  • @R.Saints what interests me are the status, I need to remove this status 6 of 20:00:01 which was created repeated after the same status 6 of 19:55:26, and what I have to differentiate one from the other is the date and time field.

  • @Danielmedeiros thanks for the tip and has been made the treatment, but I need to clean this subject who was in the status of requests, so I need a way to delete the dirt without affecting the other status.

  • Thanks for the @Rovannlinhalis tip, but this situation would help me if I wanted to leave just one status line, but I need to keep track, and a request to go back and forth to the same status.

  • @A.Santos the requests can pass to status 1, pass to 2 and return to 1 and go again to 2, what can not occur is to pass from status 1 to 1 in sequence, this treatment was performed, but the dirt remained. rsrsrs. Therefore I need to remove this "dirt" from a considerable number of requests.

  • That’s why I say if there are three lines in sequence with equal status how you intend to treat it?

  • @R.Santos If there are three equal lines in sequence, you will have to remain only one.

Show 6 more comments

2 answers

1


I arrived at the following result, where I selected all the records that should be deleted.

with tabela as (
Select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha, 
p.Cliente,
p.Status,
p.DataHora
from Pedidos p)
, Lixos as (
select 
tx.* 
from tabela t
inner join tabela tx on tx.linha = t.linha+1 and t.status = tx.status  and t.Cliente = tx.Cliente
)

Select * from Lixos;

And finally we can delete the wrong records:

with tabela as (
Select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha, 
p.Cliente,
p.Status,
p.DataHora
from Pedidos p)
, Lixos as (
select 
tx.* 
from tabela t
inner join tabela tx on tx.linha = t.linha+1 and t.status = tx.status  and t.Cliente = tx.Cliente
)

delete p from Pedidos p
          inner join Lixos l
           on l.Cliente = P.Cliente 
           and l.Status = P.Status
           and l.DataHora = P.DataHora;


Select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha, 
p.Cliente,
p.Status,
p.DataHora
from Pedidos p ORDER BY DataHora ASC;

Results:

Current records:

inserir a descrição da imagem aqui

Records to be deleted:

inserir a descrição da imagem aqui

Result after exclusion:

inserir a descrição da imagem aqui

I put in Sqlfiddle, where I list all the records, and then which should be deleted. Obs. I inserted other test lines for the case of 3 equal status in a row: http://sqlfiddle.com/#! 6/2a0d0/6

Finally, Sqlfiddle with the delete command: http://sqlfiddle.com/#! 6/499f3e/2

Note. It would be appropriate that the records have a key Primary.

0

Well... if I understand the question correctly, and without regard to performance:

Edit Great: I noticed from the comments that

if there are three equal lines in sequence, only one must remain.

It did not take this into account in the original code. Large edits to working.

declare @Pedidos as table (Cliente int, Status int, DataHora datetime )

Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,0 ,'03-11-2017 18:10:56:500')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,2 ,'03-11-2017 18:10:57:410')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,1 ,'03-11-2017 18:10:54:923')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,3 ,'03-11-2017 18:26:50:513')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,4 ,'03-11-2017 18:27:06:143')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:00:01:523')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,18,'03-11-2017 20:10:05:563')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 19:55:26:983')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:15:06:563');

1) List all orders:

WITH todosOsPedidosOrdenados as 
(
    select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos
)

2) Count the number of duplicates:

, contagem as 
(
    select COUNT(*) as duplicates, cLIENTE, status  FROM todosOsPedidosOrdenados  GROUP BY Cliente, Status
)
, duplicados as 
(
    select * from contagem where duplicates > 1
)

3) Understand what request we will want keep up

, pedidoAManter as
(
    SELECT 
        todosOsPedidosOrdenados.Cliente, 
        todosOsPedidosOrdenados.Status, 
        min(todosOsPedidosOrdenados.DataHora) as dataHoraAManter
    FROM 
        duplicados LEFT JOIN todosOsPedidosOrdenados 
        ON todosOsPedidosOrdenados.Cliente = duplicados.Cliente 
            AND todosOsPedidosOrdenados.Status = duplicados.Status
    group by todosOsPedidosOrdenados.Cliente, todosOsPedidosOrdenados.status
)

4) Understand which request we will remove, and choose only the most recent (guaranteed by the top 1 and the order by)

, pedidoARemover as
(
    SELECT  todosOsPedidosOrdenados.Cliente, 
        todosOsPedidosOrdenados.Status,
        todosOsPedidosOrdenados.DataHora
    FROM todosOsPedidosOrdenados LEFT JOIN pedidoAManter 
    ON  todosOsPedidosOrdenados.Cliente = pedidoAManter.Cliente 
        AND todosOsPedidosOrdenados.Status = pedidoAManter.Status 
    WHERE
        todosOsPedidosOrdenados.DataHora <> pedidoAManter.dataHoraAManter
)

5) Store the result

select * INTO #pedidoARemover_ FROM pedidoARemover;

6) Dealing with delete

delete a
from @Pedidos a
    left Join #pedidoARemover_ b
    on  a.Cliente = b.Cliente
        and a.Status = b.Status
        AND a.DataHora = b.DataHora
WHERE
    b.Cliente IS NOT NULL

7) See the result

SELECT * FROM @Pedidos
order by status asc

8) Discarding the temporary table

DROP TABLE #pedidoARemover_;

Final code (copy-Paste for SQL Management Studio should work)

declare @Pedidos as table (Cliente int, Status int, DataHora datetime )

Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,0 ,'03-11-2017 18:10:56:500')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,2 ,'03-11-2017 18:10:57:410')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,1 ,'03-11-2017 18:10:54:923')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,3 ,'03-11-2017 18:26:50:513')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,4 ,'03-11-2017 18:27:06:143')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:00:01:523')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,18,'03-11-2017 20:10:05:563')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 19:55:26:983')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:15:06:563');

SELECT ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos;

WITH todosOsPedidosOrdenados as 
    (
        select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos
    )
, contagem as 
    (
        select COUNT(*) as duplicates, cLIENTE, status  FROM todosOsPedidosOrdenados  GROUP BY Cliente, Status
    )

, duplicados as 
    (
        select * from contagem where duplicates > 1
    )
, pedidoAManter as
    (
        SELECT 
            todosOsPedidosOrdenados.Cliente, 
            todosOsPedidosOrdenados.Status, 
            min(todosOsPedidosOrdenados.DataHora) as dataHoraAManter
        FROM 
            duplicados LEFT JOIN todosOsPedidosOrdenados 
            ON todosOsPedidosOrdenados.Cliente = duplicados.Cliente 
                AND todosOsPedidosOrdenados.Status = duplicados.Status
        group by todosOsPedidosOrdenados.Cliente, todosOsPedidosOrdenados.status
    )
, pedidoARemover as
    (
        SELECT  todosOsPedidosOrdenados.Cliente, 
            todosOsPedidosOrdenados.Status,
            todosOsPedidosOrdenados.DataHora
        FROM todosOsPedidosOrdenados LEFT JOIN pedidoAManter 
        ON  todosOsPedidosOrdenados.Cliente = pedidoAManter.Cliente 
            AND todosOsPedidosOrdenados.Status = pedidoAManter.Status 
        WHERE
            todosOsPedidosOrdenados.DataHora <> pedidoAManter.dataHoraAManter
    )

select * INTO #pedidoARemover_ FROM pedidoARemover;


delete a
from @Pedidos a
left Join #pedidoARemover_ b
on  a.Cliente = b.Cliente
and a.Status = b.Status
AND a.DataHora = b.DataHora
WHERE
    b.Cliente IS NOT NULL


SELECT * FROM #pedidoARemover_;

SELECT ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos;
DROP TABLE #pedidoARemover_;

Screenshot of the result:

Tabela inicial - linhas a remover - tabela final.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.