How to delete duplicate lines in Postgres database?

Asked

Viewed 160 times

0

In my bank I have a table that has a list of (carrier and waste specifications) where the carriers can have several waste, I checked that the data are duplicated and are represented in this way:

code_specification | carrier code | specifications

           1                    |             2        |           1        
           2                    |             2        |           5       
           3                    |             5        |           2        
           4                    |             5        |           5       
           5                    |             5        |           2       
           6                    |             8        |           1       
           7                    |             9        |           5        
           8                    |             9        |           5       
           9                    |            10        |           1        
           10                   |            10        |           3    
           11                   |            10        |           1    
           12                   |            10        |           1    
           13                   |            10        |           5

I made a query to tell me the duplicate line quantity and it worked but with doubts to remove these duplicate data from the bank.

SELECT codigotransportadora,especificacoesresi, count(*) FROM transportadorasespecificacoesresiduos
WHERE
codigotransportadora <> 0
AND especificacoesresi<> 0
GROUP BY codigotransportadora,especificacoesresi
HAVING COUNT(*) > 1
 ORDER BY codigotransportadora)

Outcome of the consultation :

carrier code | specifications resi| Count

     5           |             2     |      2        
     9           |             5     |      2      
     10          |             1     |      3      

I’m racking my brain trying to create a script to delete these duplicates...

  • put in http://sqlfiddle.com/ if possible, and what is the condition to delete ? 5|2 has the codigotrans_especificacoes_residuos like 3 and 5, which one erases ?

  • in the result of the query shows the amount of duplicated, so looking for a way to delete and leave only one record.

  • I know, but the column codigotrans_especificacoes_residuos is not duplicated, which one will be deleted ?

  • whatever makes will not impact on my application, however only need to have a record.

2 answers

0

One possibility is to use the DISTINCT ON clause to create an auxiliary table without duplicates and then replace the original table with the auxiliary.

CREATE TABLE ter AS SELECT DISTINCT ON(codigotransportadora, especificacoesresi), *  FROM transportadorasespecificacoesresiduos;

0


Select duplicate records in a temporary table

with duplicadas as 
(SELECT 
codigotransportadora,
especificacoesresiduos, 
count(*) FROM tabela
GROUP BY codigotransportadora,especificacoesresiduos
HAVING COUNT(*) > 1
ORDER BY codigotransportadora)

Delete from the table where it exists in the temporary table and keep at least one (in this case the highest value of codigotrans_especificacoes_residuos):

delete from tabela 
where exists (select 1 
              from duplicadas 
              where duplicadas.codigotransportadora = tabela.codigotransportadora 
              and duplicadas.especificacoesresiduos = tabela.especificacoesresiduos)
and tabela.codigotrans_especificacoes_residuos <> (select 
                                                   max(t.codigotrans_especificacoes_residuos) 
                                                   from tabela t
                                                   where t.codigotransportadora = tabela.codigotransportadora
                                                   and t.especificacoesresiduos = tabela.especificacoesresiduos);

Check on the Sqlfiddle: http://sqlfiddle.com/#! 15/560712/5

Other option (Using Window functions - Rank)

Rank in a temporary table:

with rank as (
select
codigotrans_especificacoes_residuos,
especificacoesresiduos,
codigotransportadora,
rank() over (Partition by especificacoesresiduos,codigotransportadora order by codigotrans_especificacoes_residuos desc) as i
from tabela)

Delete where rank is greater than 1:

delete from tabela 
where exists (select 1 from rank 
              where rank.codigotrans_especificacoes_residuos = tabela.codigotrans_especificacoes_residuos
              and rank.especificacoesresiduos = tabela.especificacoesresiduos
              and rank.codigotransportadora = tabela.codigotransportadora
              and rank.i > 1);

Note: The temporary table code (With) has to be executed in the same command as delete.

Check on the Sqlfiddle:

http://sqlfiddle.com/#! 15/560712/10

  • Our bro, worked out the option using "Other option (Using Window functions - Rank)" I was impressed, I’m very grateful for the help.

  • you have to mark as an answer to your question, if you have difficulties you can make the [tour] of the community to know how it works =]

  • 1

    beauty, ready.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.