Remove duplicated data and return the highest value date field

Asked

Viewed 827 times

1

We are migrating signatures from an Access database to Mysql, but before migrating I need to process some parts of the data. I need:

  • Remove duplicate id’s
  • Return the highest value of the date field for these id’s

In the librecalc itself you can remove duplicate id’s, but I need to compare the date field of both duplicate records, because I want to use the larger one. So

example:

id, data
2,  05/07/2016
2,  08/12/2018

Notice that they are duplicated, but I need to return the longest date between these 2 fields, which, in the example, would be 08/12/2018. In the end, there would be only one record with id 2 and the largest date filtered, 08/12/2018.

Could you do this with SQL? If not,

Could anyone explain to me the logic to do this in PHP? I tried to make a script in PHP that does this, but it hasn’t worked for all cases.

My approach in PHP:

<?php

$file = file("access_old2.csv");
$new = fopen("access_formatado.csv", "w");

for ($i=0; $i < count($file); $i++) { 
print "---------------------";

$explode = explode("|", $file[$i]);
$explode[10] = strtotime(str_replace("/", "-", $explode[10])); // $explode[10] é o campo data
$explode_next = explode("|", $file[$i+1]);
$explode_next[10] = strtotime(str_replace("/", "-", $explode_next[10])); // $explode_next[10] é o campo data do próximo laço

if ($explode[0] == $explode_next[0]) { // $explode[0] é a id | não grava os dados repetidos e recupera a maior data de assinatura

    $date_atual = $explode[10];
    $date_next = $explode_next[10];

    if ($date_atual !== $date_next) {
        echo "diferente";

        print("<pre><br>"); //hack

        var_dump($date_next);
        var_dump($date_atual);
        var_dump(date("Y-m-d", max($date_atual, $date_next)));

        print("</pre><br>"); //hack

        $explode[10] = max($date_atual, $date_next);
        $explode_next[10] = max($date_atual, $date_next);
    }
    continue;
}

$explode[10] = date("Y-m-d", $explode[10]);
$explode_next[10] = date("Y-m-d", $explode_next[10]);

if ($explode[8] == "SP") { // coloca um zero a esquerda dos ceps de SP
    $explode[6] = str_pad($explode[6], 8, "0", STR_PAD_LEFT);
}

$explode[11] = "ASSCH";
$explode[6] = substr($explode[6], 0, 5)."-".substr($explode[6], 5, 8);

if ($explode[9] == "") {
    $explode[9] = "BR";
}

print("<pre><br>"); //hack
var_dump($explode);
print("</pre><br>"); //hack

fputcsv($new, $explode, "|", '"');


}


fclose($new);

In this script, if the current loop id $explode[0] is equal to the id of the next loop, it should return the max($data_atual, $data_next). It works for some, but not for others. It has to do with the order of records, but I’ve been in this code for a day and I haven’t made any progress. (I believe to be an error in the logic of the script).

TL;DR

  • SQL to remove duplicates and leave the longest date between duplicates
  • My script does not work for some cases.

2 answers

2


If the column is set to date type, it is possible to do this directly in Mysql (it should work in Access as well):

SELECT id, MAX(some_date) 
FROM old_table
GROUP BY id;

That one query will group the largest date by id. Next you could move to a new table, already with Primary key definite:

INSERT INTO new_table
SELECT id, MAX(some_date) 
FROM old_table
GROUP BY id;

See working on SQL Fiddle.

0

I think this might help:

$formato = 'd/m/Y'; // define o formato para dd/mm/yyyy
$data1 = DateTime::createFromFormat($formato, '05/07/2016'); // define data 1
$data2 = DateTime::createFromFormat($formato, '08/12/2018'); // define data 2

if($data1 > $data2){  // checa se data1 é maior que data2
    echo '$data1 é maior: ';
    echo date_format($data1, $formato) ;
}else{
    echo '$data2 é maior: ';
    echo date_format($data2, $formato) ;
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.