Duplicate record in the database

Question

Duplicate record in the database

Asked 5 years, 11 months ago

Viewed 205 times

1

Good evening folks. I am in need of a help to remove some duplicate table records from my Mysql 5.7 database

By running the following query I can identify which records are duplicated:

SELECT source_id, COUNT(*) TOTAL 
FROM news 
GROUP BY source_id 
HAVING COUNT(*) > 1

I wonder how I can do to delete duplicate records and leave only the original after the SELECT command above.

1

And how do you identify which of the records with duplicate source_id is the original? If it is not allowed to duplicate it is not better to put this field as primary key or as UNIQUE?

– anonimo

2019/08/22 at 22:11
Use the SELECT DISTINCT

– Thiago Fernandes

2019/08/23 at 01:35
1

If the table has a sequential ID it can simply Join the table with itself: JOIN ON a.source_id = b.source_id AND id_sequencial != MIN( id_sequencial ) effectively filtering the "originals" with smaller id_unico. the returned id_unico will be the ones that should be deleted (test before, of course). If you don’t have ID id_unico, there’s a good chance you’ll have more serious problems than deletion. You can do everything in one operation, but test with a SELECT (subquery with Join) before switching to DELETE. Do not forget the most important, which is to fix the application so that the problem even happens.

– Bacco

2019/08/23 at 09:39

1 answer

Browser other questions tagged mysql sql mysqli

You are not signed in. Login or sign up in order to post.

by Eduardo Xavier • **560** points · Answer 1 · 2019-08-23T07:43:03+00:00

You will need to fulfill a series of steps.

Create a temporary table named news_tmp
Make a SELECT DISTINCT in the news table to take only one record and ignore duplicates. In select, put the columns except the primary key column.

Then run a INSERT INTO thus:

INSERT INTO news_temp (campoA, campoB)
  SELECT DISTINCT campoA, campoB
  FROM news;

DROP TABLE news;
RENAME TABLE news_temp TO news;

Unfortunately your primary key must be disregarded. If you need to maintain the value of the primary key, you will need to do an algorithm with another programming language.