How to delete all duplicates except one?

Asked

Viewed 7,016 times

30

I have a table that has duplicate records. They shouldn’t be duplicated.

Some records duplicated two, five and even thirty times by accident.

I need to remove these duplicate records, but I need to leave one. So for example, if I have a repeat value 30 times, I need to delete 29 times.

My scenario is more or less this:

SELECT * FROM creditos WHERE status = 0

The result is like this:

id | usuario_id | status

1  |   1         | 0
2  |   2         | 0
3  |   5         | 0
4  |   5         | 0
5  |   5         | 0
6  |   15        | 0
7  |   15        | 0
8  |   15        | 0
9  |   15        | 0
10 |   17        | 0

But I need you to stay that way:

id |  usuario_id | status
1  |   1         | 0
2  |   2         | 0
5  |   5         | 0
9  |   15        | 0
10 |   17        | 0

Note: I just want to delete duplicates if the status is 0, but I want to leave one.

How to do a query that deletes duplicates, but always leaving a record?

  • Uses the having.

  • With the HAVING will delete everything, no?

  • What version of Mysql?

5 answers

27


Could get the smallest id grouping by the user and removing all others.

It would look something like this:

delete from creditos where status = 0 and id not in
( SELECT * FROM 
    (select min(id) from creditos group by usuario_id) AS temp_tab
);

Note that the status = 0 is in the DELETE and not in the sub-query. If you want id returned with the status = 0, add this WHERE also in the sub-consultation.

You can also see this and other forms in these answers:

  • I think you would have to trade min() for max() and it would be better if it were not exists instead of not in.

  • @Marconciliosouza I believe the min() or the max() will according to need. The only difference would be whether it would remain the first record or the last. About the exists() and the not in() would make a difference if the return were null or a matter of time. However, I don’t think that’s the case with the question.

21

One way is to add an index with the column you want to keep as single by adding the clause IGNORE that will delete possible errors and warnings and delete lines that do not obey the created index:

ALTER IGNORE TABLE creditos
ADD UNIQUE INDEX idx_creditos (usuario_id, status);

ALTER TABLE

...

IGNORE is the MySQL Extension to standard SQL. It Controls how ALTER TABLE Works if there are Duplicates on Unique Keys in the new table or if warnings occur when Strict mode is enabled. If IGNORE is not specified, the copy is aborted and Rolled back if Duplicate-key errors occur. If IGNORE is specified, only the first Row is used of Rows with Duplicates on a Unique key, The other Conflicting Rows are Deleted. Incorrect values are truncated to the Closest matching acceptable value.

...

Or in free translation:

...

IGNORE is an extension of MySQL to SQL standard. It controls how the ALTER TABLE works if there are duplicates in unique keys in the new table or if any warning happens when the strict mode is enabled. If the IGNOREis not specified, the copy is aborted and undone if duplicate key errors occur. If IGNORE if specified, only the first line is used for lines with duplicated keys in single keys. The other conflicting lines will be deleted. Incorrect values are "truncated" to the nearest acceptable value.

...


Another method is to use the DELETE to delete duplicate lines.

First you have to link the table to itself according to the desired column. You can do this directly in the clause FROM or with a JOIN:

FROM creditos c1,
     creditos c2
...
WHERE c1.usuario_id = c2.usuario_id
  AND c1.status = c2.status

or

...
FROM creditos c1
     INNER JOIN c2 ON c1.usuario_id = c2.usuario_id
                  AND c1.status = c2.status

Soon after this indicate what will be the criterion to delete the record, for example, if you want to select and delete the records with higher id. Add the clause AND (considering the first example):

...
AND c1.id > c2.id

For the second example:

...
WHERE c1.id > c2.id

If you want to select and delete children’s id, respectively we would have:

...
AND c1.id < c2.id

and

...
WHERE c1.id < c2.id

Now we will add the attribute that will filter the record, according to your example:

...
AND c.status = 0

Applying the steps described above

If you want to select and delete the minor record id:

DELETE c1
  FROM creditos c1,
       creditos c2
 WHERE c1.id > c2.id
   AND c1.usuario_id = c2.usuario_id
   AND c1.status = c2.status
   AND c1.status = 0;

Resulting in (see working on DB Fiddle):

| id  | usuario_id | status |
| --- | ---------- | ------ |
| 1   | 1          | 0      |
| 2   | 2          | 0      |
| 3   | 5          | 0      |
| 6   | 15         | 0      |
| 10  | 17         | 0      |

Or using INNER JOIN:

DELETE c1
  FROM creditos c1
 INNER JOIN creditos c2 ON c1.usuario_id = c2.usuario_id
                       AND c1.status = c2.status
 WHERE c1.id > c2.id
   AND c1.status = 0;

Resulting in (see working on DB Fiddle):

| id  | usuario_id | status |
| --- | ---------- | ------ |
| 1   | 1          | 0      |
| 2   | 2          | 0      |
| 3   | 5          | 0      |
| 6   | 15         | 0      |
| 10  | 17         | 0      |

If you want to select and delete the major record id:

DELETE c1
  FROM creditos c1,
       creditos c2
 WHERE c1.id < c2.id
   AND c1.usuario_id = c2.usuario_id
   AND c1.status = c2.status
   AND c1.status = 0;

Resulting in (see working on DB Fiddle):

| id  | usuario_id | status |
| --- | ---------- | ------ |
| 1   | 1          | 0      |
| 2   | 2          | 0      |
| 5   | 5          | 0      |
| 9   | 15         | 0      |
| 10  | 17         | 0      |

Or using INNER JOIN:

DELETE c1
  FROM creditos c1
 INNER JOIN creditos c2 ON c1.usuario_id = c2.usuario_id
                       AND c1.status = c2.status
 WHERE c1.id < c2.id
   AND c1.status = 0;

Resulting in (see working on DB Fiddle):

| id  | usuario_id | status |
| --- | ---------- | ------ |
| 1   | 1          | 0      |
| 2   | 2          | 0      |
| 5   | 5          | 0      |
| 9   | 15         | 0      |
| 10  | 17         | 0      |

Observing: To test the data before erasing, you can use SELECT c1.* instead of DELETE c1 in the examples cited above.


References:

  • where c1.status = c2.status.. then I can put and c1.status = 0?

  • If you want to delete only the 0 code yes, but you have to put the and c2.status = 0 also to connect the records

9

In a simpler and more practical way: I hope to help you

with cte as 
( 
   select usuario_id ,  
          row_number ()  over ( partition  by usuario_id order  by usuario_id )  as r
    from creditos where status = 0

 ) 
delete cte where r >  1 ;
  • 1

    I only had to communicate that this only works for more recent versions of MySQL

3

In Mysql 8.0 there is the possibility to use RANK()

SELECT
    id, usuario_id, status
FROM (
  SELECT 
    id, usuario_id, status,
    RANK() OVER (PARTITION BY usuario_id ORDER BY id DESC) AS intRow
  FROM
    Table1
  WHERE
    status = 0
) T
WHERE
    intRow = 1

https://www.db-fiddle.com/f/7V6HBDsCmd4m9uPiz9cnWY/0

To delete records with repeated ID you could use a code like this:

DELETE FROM Table1
WHERE
    id IN (
      SELECT
          id
      FROM (
            SELECT id, RANK() OVER (PARTITION BY usuario_id ORDER BY id DESC)  AS intRow 
            FROM Table1 WHERE status = 0
      ) T WHERE intRow > 1
    );

https://www.db-fiddle.com/f/7V6HBDsCmd4m9uPiz9cnWY/2

0

Not questioning the technical merit of the proposed solutions, but when I face it I simply do something like this:

        SELECT  DISTINCT
                COLUNA1
               ,COLUNA2
               ,COLUNA3
        INTO #TEMP
        FROM TABELA
        WHERE COLUNA = 'XPTO'

        DELETE FROM TABELA
        WHERE COLUNA = 'XPTO'


        INSERT  TABELA (
                        COLUNA1
                       ,COLUNA2
                       ,COLUNA3
                       )
        SELECT  COLUNA1
               ,COLUNA2
               ,COLUNA3
        FROM #TEMP

It is not a matter of knowledge or laziness to think, it is just a matter of quickly solving a specific situation and ending the problem

  • Then you’ll do it one by one?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.