Delete older duplicate data on MS SQL server

Asked

Viewed 264 times

0

I’m having a problem with a table that has duplicate data. Duplicate data is identified through an ID, where each row has the identifier and a date, how can I erase all data with more than 1 record from in the table, leave only the most recent record?

The database is a MS SQL SERVER, that is, the script should make, in the list below, eliminates the 4 oldest lines:

ID            DATA
17081618585 | 18.02.02 18:42:41
17081618585 | 18.02.02 19:30:41
17081618585 | 18.02.02 20:42:41
17081618585 | 18.02.02 20:42:41
17081618585 | 18.02.02 22:42:42
  • in case it would just stay 17081618585 | 18.02.02 22:42:42?

1 answer

4

A simple approach is to use a CTE (Common Table Expression) to select the records you NAY want to delete. After that, just make a DELETE with Join CTE created to delete records.

See the code below:

with cte as (
    select id,max(data) max_data
    from #teste
    group by id
)
delete t
from #teste t
inner join cte
on  (t.id = cte.id)
and (t.data <> cte.max_data)

To CTE select the maximum date per ID. After that, a DELETE is made in the table in question and in Join it is specified that the date to be deleted should be different from the CTE date.

A detail: for this to work the field that stores the date must necessarily be a date in SQL Server, that is, a field of the type datetime or related. If it is a field varchar then the max who is in the CTE will fail.

  • 1

    Excellent solution! After that it would be good to create a Constraint with PK in the ID (for example), to avoid this situation in the future since the table should not have duplicated ID’s.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.