How to delete duplicates between multiple tables

Asked

Viewed 283 times

4

I have three tables with structure similar to this:

CREATE TABLE itens_chao (
    id int PRIMARY KEY AUTO_INCREMENT,
    pos_x int,
    pos_y int,
    nome varchar(255),
    quantidade int,
    numero_serie int(11)
);

CREATE TABLE itens_bolsa (
    id int PRIMARY KEY AUTO_INCREMENT,
    player_id int,
    nome varchar(255),
    quantidade int,
    numero_serie int(11)
);

CREATE TABLE itens_banco (
    id int PRIMARY KEY AUTO_INCREMENT,
    player_id int,
    banco int,
    nome varchar(255),
    quantidade int,
    numero_serie int(11)
);

and with those values:

INSERT INTO itens_chao (pos_x, pos_y, nome, quantidade, numero_serie) VALUES (10, 5, 'Pedra diamante', 3, 17881); --repetido apenas em uma tabela
INSERT INTO itens_chao (pos_x, pos_y, nome, quantidade, numero_serie) VALUES (10, 5, 'Pedra diamante', 3, 17881); --repetido apenas em uma tabela
INSERT INTO itens_chao (pos_x, pos_y, nome, quantidade, numero_serie) VALUES (10, 5, 'Pedra ametista', 3, 17831);
INSERT INTO itens_chao (pos_x, pos_y, nome, quantidade, numero_serie) VALUES (10, 5, 'Moeda ouro', 1, 17833); -- repetido nas 3 tabelas!
INSERT INTO itens_chao (pos_x, pos_y, nome, quantidade, numero_serie) VALUES (10, 5, 'Moeda ouro', 1, 17833); -- repetido nas 3 tabelas!

INSERT INTO itens_bolsa (player_id, nome, quantidade, numero_serie) VALUES (1, 'Moeda prata', 3, 17860); -- repetido em 2 tabelas
INSERT INTO itens_bolsa (player_id, nome, quantidade, numero_serie) VALUES (2, 'Moeda ouro', 3, 17833); -- repetido nas 3 tabelas!

INSERT INTO itens_banco (player_id, banco, nome, quantidade, numero_serie) VALUES (3, 1, 'Moeda prata', 3, 17860); -- repetido em 2 tabelas
INSERT INTO itens_banco (player_id, banco, nome, quantidade, numero_serie) VALUES (3, 1, 'Moeda ferro', 3, 17899);
INSERT INTO itens_banco (player_id, banco, nome, quantidade, numero_serie) VALUES (4, 2, 'Moeda ouro', 1, 17833); -- repetido nas 3 tabelas!

Some items have the "numero_serie" field repeated in the tables, and I would like to delete the repeated ones in all tables, keeping only one. The closest I came was with:

DELETE S1, S2 FROM itens_bolsa S1, itens_chao S2 WHERE S1.numero_serie = S2.numero_serie;

That deletes all repeats from the two selected tables, but does not maintain a value.

  • Does that answer your question? https://answall.com/questions/166042/comordeletar-todos-duplicados-exceto-um

  • Do not see how bad my criticism, but a modeling error disturbs too much in the future! This makes the bench "sit" in cases of many duplicates. And relational database was not created for unnecessary "repeats" like this one. Why use relational database without normalization? That’s the question you never want to shut up! Licking your elbow with SQL is possible. Anything is possible with SQL. Some way there is.

3 answers

4

One way is to add an index with the column you want to keep as single by adding the clause IGNORE that will delete possible errors and warnings and delete lines that do not obey the created index:

ALTER IGNORE TABLE itens_chao
ADD UNIQUE INDEX idx_itens_chao(numero_serie);

ALTER TABLE

...

IGNORE is the MySQL Extension to standard SQL. It Controls how ALTER TABLE Works if there are Duplicates on Unique Keys in the new table or if warnings occur when Strict mode is enabled. If IGNORE is not specified, the copy is aborted and Rolled back if Duplicate-key errors occur. If IGNORE is specified, only the first Row is used of Rows with Duplicates on a Unique key, The other Conflicting Rows are Deleted. Incorrect values are truncated to the Closest matching acceptable value.

...

Or in free translation:

...

IGNORE is an extension of MySQL to SQL standard. It controls how the ALTER TABLE works if there are duplicates in unique keys in the new table or if any warning happens when the strict mode is enabled. If the IGNOREis not specified, the copy is aborted and undone if duplicate key errors occur. If IGNORE if specified, only the first line is used for lines with duplicated keys in single keys. The other conflicting lines will be deleted. Incorrect values are "truncated" to the nearest acceptable value.

...

Observing: Your structure seems to me a little too complex for what you want to do. You could just have a table with a column tipo the most.

  • Very interesting that way, I erased then created Unique, thanks @sorack.

4


Gabriel, follows a suggestion using 3 Deletes, one for each table:

delete t
from itens_banco as t
left join itens_banco as ba 
  on t.numero_serie = ba.numero_serie and t.id > ba.id
left join itens_chao as ch 
  on t.numero_serie = ch.numero_serie
left join itens_bolsa as bo
  on t.numero_serie = bo.numero_serie
where
  ba.id is not null or
  ch.id is not null or
  bo.id is not null;
  
delete t
from itens_bolsa as t
left join itens_bolsa as bo
  on t.numero_serie = bo.numero_serie and t.id > bo.id
left join itens_chao as ch 
  on t.numero_serie = ch.numero_serie
where
  bo.id is not null or
  ch.id is not null;

delete t
from itens_chao as t
left join itens_chao as ch
  on t.numero_serie = ch.numero_serie and t.id > ch.id
where
  ch.id is not null;

I hope it helps

  • A small addendum: the Where clause of the last Delete can be removed if Left Join is changed to Inner Join

3

How your tables have id AUTO_INCREMENT, a third option is possible, without creating index or doing Join that would, group by the duplicate field "numero_serie" and mater the smallest "ID".

Take this example:

SELECT 
      numero_serie,
       min(id) primeiro_id
FROM 
     itens_chao
GROUP BY
     numero_serie

The result of this query will be:

| numero_serie | primeiro_id |
| ------------ | ----------- |
| 17831        | 3           |
| 17833        | 4           |
| 17881        | 1           |

That is, the records with an ID that you don’t want to delete. Note that I used MIN() to bring the lowest value, that is, the first that was inserted, because the ID column is AUTO_INCREMENT, but if you prefer to have the last one that was inserted, or "the most up-to-date", just switch to MAX().

Dai just delete the records whose ID is not in this query, that is, all duplicates. For this we can put the previous query in a subquery together with DELETE:

delete from itens_chao 
where id not in  (
   select subquery.primeiro_id
    from (
        select min(id) primeiro_id
          from itens_chao
         group by numero_serie) subquery
  );

You can see it working here: https://www.db-fiddle.com/

In summary: group by duplicate field, and bring the first ID, then delete all records whose ID is not in that subquery.

If the table does not have a sequential ID it would have to change the logic in the subquery, but the concept is always the same: make a query that brings the ID you want to keep and delete from others.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.