Remove duplicate records from a table

Asked

Viewed 76 times

0

Having the following tables without keys still defined:

Aeroplane (code_aviao, code_company, year, type_engine, type_airplane, status)

Flight ( Flight code, Flight number, plane code, code_company)

In some cases, about 400 aircraft table records, a plane is associated with two airlines and I do not want that, I want each plane to have only one airline. In the following scenario:

Codeplane | Codecompany

N107AS | AS

N107AS | EV

The same plane is associated with two different airlines. It turns out that in the Flight table, when it comes to the N107AS plane there are 8000 records for when it is associated with the company AS and 1000 for the company EV.

What I want is to remove the companies that check fewer records so that each plane has only one airline and that is the one that has the most records in the flight schedule.

This is for the 400 cases. Is there an automated way to do it or will I have to delete them individually?

  • Don’t you have (or don’t know who) control of the system that generates this data? Because perhaps the right thing would be to correct how they are generated and not to be ruling out duplicities.

  • It’s old data and it’s in CSV files. The goal is not to analyze data but to measure the performance of some databases. I only came up with this conflict because in my final table vião the attribute code_aviao should be the primary key and that way it does not give. At the same time I want to have as many records as possible on the flight chart

  • There is how to create an SQL that generates another SQL that is exactly what you need, but for this you will have to post here the relational structure of the models. If it is not impossible to create some solution to your problem.

1 answer

1

First a query will be required to return the number of flights of an airplane, grouped by company:

select 
    v.cod_companhia t_comp, 
    count(v.cod_voo) as t_voo,
    v.cod_aviao as t_avi
  from 
    voo v
  where
    v.cod_aviao = 'N107AS'
  group by
    v.cod_companhia;

The result of this consultation will be the number of flights on a given aeroplane, grouped by company:

resultado da consulta 1

Having this result, we will include this consultation within a new consultation, which will return us the company that has less flight for this plane, given the total amount of flights:

select t_comp, t_avi, min(t_voo)
from
  (select 
    v.cod_companhia t_comp, 
    count(v.cod_voo) as t_voo,
    v.cod_aviao as t_avi
  from 
    voo v
  where
    v.cod_aviao = 'N107AS'
  group by
    v.cod_companhia) g_voo;

resultado da consulta 2

With the result of this consultation, it is already possible to determine which airline has the fewest flights on a particular aircraft. From this, it is possible to implement a delete based on this consultation.

Is not the solution, but it may help us to arrive at a.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.