Help with select and repeat records

Asked

Viewed 106 times

4

I have a postgresql database with a table like this:

id_usuario | dt_atualizacao | atualizacao | outros_dados
123        | 01-01-2001     | 0           | abc
123        | 01-01-2005     | 1           | abe
123        | 01-01-2012     | 1           | acd
123        | 01-01-2018     | 1           | acc
124        | 01-01-2017     | 0           | acj
124        | 01-01-2018     | 1           | agy
125        | 01-01-2018     | 0           | awe

What happens is this: in this application, when registering a user for the first time, it leaves the field atualizacao = 0 and put the current date.

but when an update is made in this register (let’s say I changed the field outros_dados), instead of updating the current record, it creates a new record with new date and field atualização = 1.

What I need is a select that takes all user id, without repeating it and being the most current one. In the above example, I would need the following result:

id_usuario | dt_atualizacao | atualizacao | outros_dados
123        | 01-01-2018     | 1           | acc
124        | 01-01-2018     | 1           | agy
125        | 01-01-2018     | 0           | awe

Any tips on how to do it?

BS: before anyone comes here asking "where is the code", I already inform that I have no idea how to generate a select so, so I did not post.

2 answers

7


The table structure is not exactly "great".

If what you want is to have an update history then you should, as you rightly say, @Dudaskank, is to change the data type of DATE for TIMESTAMP to be able to store the time, minute and second of the update and thus be able to pick up the last update in a simpler way and "clean".

The best way to get the latest updates, taking into account the change in column type dt_atualizacao, is using the operator DISTINCT ON:

SELECT      DISTINCT ON (id_usuario) id_usuario
        ,   dt_atualizacao
        ,   outros_dados
FROM        historico
ORDER BY    id_usuario
        ,   dt_atualizacao DESC

The table Historico will be your table with updates.

  • is that it’s not my system. I’m just pulling information from it, so I can’t change it

  • 1

    Table not a column of sequential numbering? Or any other where a correct ordering can be done?

  • worse than not. is a huge database (more than 700 tables) that link to each other, and I’m having to figure out which one has the information I need. for the time being with this one (and posted only an example of her)... but thanks, I will use your example code to see if I can get something here

  • 1

    The code will work, but it may not return the 100% correct result. Returns to last date, but may not necessarily be the last update (because we have no way of knowing!). I added only the question of atualizacao = 1.

  • but I think picking up the last date works yes, because there is never more than one update on the same day. I will be this afternoon

  • 1

    So ok, in principle it will work at 100%!

  • 2

    In this case, just remove the clause where, because keeping it will not appear the line with the 125 as initially desired.

  • worked correctly, I’m just adapting here and linking with other tables. if you can, clarifies me two things: 1. what’s the difference between distinct and distinct on? 2. has some way of not ordering first by the field of distinct on? need to order by another field. hug

  • 1

    The DISTINCT only removes duplicates of the result, while the DISTINCT ON finds the first record in a group (which is your case). The field you put in DISTINCT ON is the one where you want to get the first group registration, if you change that the results will be different! But you can "play" a little bit with that, see where it goes.

  • 1

    @Italo Rodrigo, if you need to order in another field, then use my answer. Further information on DISTINCT and DISTINCT ON can be found in the documentation: https://www.postgresql.org/docs/10/static/sql-select.html#SQL-DISTINCT

Show 5 more comments

3

Some points to note:

  1. Instead of using a field like date, that seems to only keep the date in this example, use a timestamp or similar, because if there is more than one update on the same date, you will be able to differentiate between them which one.
  2. I believe that column is not necessary atualizacao. Just you pick the biggest date for determined id to find out which the latest.

Update 29/08/2018 13:49

As the author reported that only one update per day is what happens in practice, even using date will work smoothly.

I also created a SQL Fiddle with the example data, both with my solution and with the solution of the colleague @João Martins. For some reason, which I don’t understand much to tell the truth, my query ended up being faster, so the tip is (but his is much easier to read by a human, if you don’t need more speed is also another tip).


For the query, try something like this, where historico is the table name and dt_atualizacao the field with the date and time of that record:

SELECT * FROM historico a WHERE dt_atualizacao=(SELECT MAX(dt_atualizacao) FROM historico b WHERE b.id_usuario=a.id_usuario);
  • is that it’s not my system. I’m just pulling information from it, so I can’t change it

Browser other questions tagged

You are not signed in. Login or sign up in order to post.