1
We are migrating signatures from an Access database to Mysql, but before migrating I need to process some parts of the data. I need:
- Remove duplicate id’s
- Return the highest value of the date field for these id’s
In the librecalc itself you can remove duplicate id’s, but I need to compare the date field of both duplicate records, because I want to use the larger one. So
example:
id, data
2, 05/07/2016
2, 08/12/2018
Notice that they are duplicated, but I need to return the longest date between these 2 fields, which, in the example, would be 08/12/2018
. In the end, there would be only one record with id 2 and the largest date filtered, 08/12/2018
.
Could you do this with SQL? If not,
Could anyone explain to me the logic to do this in PHP? I tried to make a script in PHP that does this, but it hasn’t worked for all cases.
My approach in PHP:
<?php
$file = file("access_old2.csv");
$new = fopen("access_formatado.csv", "w");
for ($i=0; $i < count($file); $i++) {
print "---------------------";
$explode = explode("|", $file[$i]);
$explode[10] = strtotime(str_replace("/", "-", $explode[10])); // $explode[10] é o campo data
$explode_next = explode("|", $file[$i+1]);
$explode_next[10] = strtotime(str_replace("/", "-", $explode_next[10])); // $explode_next[10] é o campo data do próximo laço
if ($explode[0] == $explode_next[0]) { // $explode[0] é a id | não grava os dados repetidos e recupera a maior data de assinatura
$date_atual = $explode[10];
$date_next = $explode_next[10];
if ($date_atual !== $date_next) {
echo "diferente";
print("<pre><br>"); //hack
var_dump($date_next);
var_dump($date_atual);
var_dump(date("Y-m-d", max($date_atual, $date_next)));
print("</pre><br>"); //hack
$explode[10] = max($date_atual, $date_next);
$explode_next[10] = max($date_atual, $date_next);
}
continue;
}
$explode[10] = date("Y-m-d", $explode[10]);
$explode_next[10] = date("Y-m-d", $explode_next[10]);
if ($explode[8] == "SP") { // coloca um zero a esquerda dos ceps de SP
$explode[6] = str_pad($explode[6], 8, "0", STR_PAD_LEFT);
}
$explode[11] = "ASSCH";
$explode[6] = substr($explode[6], 0, 5)."-".substr($explode[6], 5, 8);
if ($explode[9] == "") {
$explode[9] = "BR";
}
print("<pre><br>"); //hack
var_dump($explode);
print("</pre><br>"); //hack
fputcsv($new, $explode, "|", '"');
}
fclose($new);
In this script, if the current loop id $explode[0]
is equal to the id of the next loop, it should return the max($data_atual, $data_next)
. It works for some, but not for others. It has to do with the order of records, but I’ve been in this code for a day and I haven’t made any progress. (I believe to be an error in the logic of the script).
TL;DR
- SQL to remove duplicates and leave the longest date between duplicates
- My script does not work for some cases.