Is there any way to select a comparison of duplicate records?

Asked

Viewed 125 times

0

I have a table with 57 fields, due to an oversight in the past did not create the REQUEST field as Unique Key Due to the use by more than one user on a screen of the system he ended up allowing the duplicity of some records and over time the system took a record another time to update (this will have more problems to hit)but some records are exactly the same and the only thing that differentiates them is a field that I always create in the tables called codkey, which is a sequence that way my table is like this:

Data example:

codkey  pedido campo1 campo2 campo3 etc
  1      123   valor1 valor2 valor3
  2      123   valor1 valor2 valor3
  3      123   valor1 valor2 valor3
  4      123   valor1 valor2 valor3  

I know I can make a select by putting the comparison of each field in the Where, but there are 57 fields and this is giving me a time cost and uncertainty if the result will be reliable

Example:

select * from tabela a
inner join mv_tabela b on(b.campo1 = a.campo1)
  and(b.campo2 = a.campo2)
  and(b.campo3 = a.campo3)
where (a.pedido = b.pedido)

My question is: Is there a command in SQL that makes this easy?

I am using Postgresql

  • Why do you keep all this basic? Why not delete the repeated ones and start using the only one?

  • 1

    @Since this duplicity has gotten in the way of some updates because it was updated at another time, I have information on some records that I need to preserve, I’ll have to see what I have to preserve update the newest record to then delete the duplicate that will be obsolete, it will be a manual job, there’s no way, are 256 records, to analyze line by line, but duplicates are more than 2 thousand, so I want to eliminate these duplicates that are exactly the same to make it less laborious

  • Only consider duplicates if the 57 are equal ? Make a cursor or select if the line is equal to the previous delete. One thing and compare these 57 https://www.portugal-a-programar.pt/forums/topic/63846-resolver/

  • The only different field is the codkey which is the primary key, I will look for cursors, I am not very good at Postgresql functions.

1 answer

1

You can do it using the Window Function ROW_NUMBER.

In this code, I created a column to make a sort of "sort" of records. When the column rownum is equal to 1, means it is the first single record of the result.

Uniqueness is configured by the columns specified in the clause PARITION BY. Inevitably, you would have to inform the 57 columns in this clause, but I think it would not be difficult.

CREATE TABLE tabeladuplicidade 
(
   codkey INT NOT NULL,
   pedido INT,
   campo1 TEXT,
   campo2 TEXT,
   campo3 TEXT
);
ALTER TABLE tabeladuplicidade ADD CONSTRAINT pk_tabdup_codkey PRIMARY KEY(codkey);

INSERT INTO tabeladuplicidade VALUES 
     (1, 123, 'valor1', 'valor2', 'valor3')
   , (2, 123, 'valor1', 'valor2', 'valor3')
   , (3, 123, 'valor1', 'valor2', 'valor3')
   , (4, 123, 'valor1', 'valor2', 'valor3')
   , (5, 456, 'valor1', 'valor2', 'valor3');

WITH q AS 
(
   SELECT 
      ROW_NUMBER() OVER (PARTITION BY pedido, campo1, campo2, campo3 ORDER BY codkey) rownum
      , *
   FROM tabeladuplicidade
)
SELECT * FROM q
WHERE q.rownum > 1 -- Se for maior que um, é duplicidade

Browser other questions tagged

You are not signed in. Login or sign up in order to post.