Erase everything except the smallest one per day

Asked

Viewed 213 times

4

I have a space problem and there are many useless things being stored, but they are useful during the same day (you will understand).

Real problem:

There are more than 2 million "lines" and this is occupying more than 1.1GB, I am using a high availability server, anyway, this does not allow to change the capacity easily, besides having costs

Solution to the problem:

Erase data that is not the smallest based on the day, but the date is DATETIME and I need to match DATE. This process would occur every day, to delete the data of the previous day (but, as there are already data, would have to delete all old ones too).

Explanation of how it works:

The site saves several product data several times a day. Imagine the product with the id "999".

These data ("duplicated") are useful during the same day (that is, all data today should exist), so I can not simply check if there is already a larger today and prevent the insertion, as it should have this "duplication".

There is a:

      id | Prod | Preco | Data
(id auto) | 999 | 99.99 | 2015-01-25 12:31:00
(id auto) | 999 | 89.99 | 2015-01-25 11:00:00
(id auto) | 999 | 78.99 | 2015-01-25 03:37:00
(id auto) | 999 | 98.99 | 2015-01-24 11:30:00
(id auto) | 999 | 74.99 | 2015-01-24 17:37:00
(id auto) | 999 | 79.99 | 2015-01-23 02:39:00

I wish there was only:

      id | Prod | Preco | Data
(id auto) | 999 | 78.99 | 2015-01-25 03:37:00
(id auto) | 999 | 74.99 | 2015-01-24 17:37:00
(id auto) | 999 | 79.99 | 2015-01-23 02:39:00

What I tried to think:

I need something that selects everything that is not the smallest and that is on the same day.

This is extremely strange, but says the real idea:

$PegaIDMenor = SELECT id FROM tabela WHERE Preco = (SELECT MIN(Preco) FROM tabela WHERE Data LIKE '2015-01-25%' AND Prod = '999') AND Data LIKE '2015-01-25%' AND Prod = '999' LIMIT 1

// Exemplo apenas, logico que terá que existir funções de query!

$EliminaMaiores = DELETE FROM tabela WHERE id != ' $id ' AND Data LIKE '2015-01-25%' AND Prod = '999'

I thought the date go changing in loop and checking, but I believe there must be some better function, only using Mysql to perform the same task, and better performance.

I think it’s clear what I need. Remember, I don’t need SELECT but DELETE!

  • use http:/sqlfiddle.com/ to create a test

  • You want to delete everything that is not the lowest price of the specific day?

  • Yes, but not for a specific day, but always, every day. Per day each product should remain only one data, the one with the lowest price.

  • The first problem is that you’re in DATETIME and it is necessary to select by DATE (function of LIKE is extremely slow, even with index). The other problem is to locate which is the smallest per day and per product, so that it is not excluded. Another idea I had, but I can’t build it would be to take the ID of MIN() for GROUP BY prod, so would have the ID of all the lowest values per product (adding the WHERE Data LIKE would have a specific day). Then DELETE using NOT IN of these earlier Ids using WHERE Data Like. Using PHP to create a data loop, deleting date by date.

  • If you achieve something of the specific day would already be valid, because only create a loop to "change the specific day".

  • Why don’t you use DAY(Data) ? The function DAY returns the specified day, you could compare for example, WHERE DAY(Data) = 26

  • This can also be used, had read about it. I don’t know which would be better (in a matter of speed), because DAY would pick up and "convert" every day.

Show 2 more comments

2 answers

2


I don’t know about performing, but sql will delete all prices that are not the lowest price of each product on smaller days than today. Some Where should improve performance.

DELETE `precos`
FROM `precos` 
left join ( 
    SELECT max(id) id
    from precos
    inner join (
        SELECT date(Data) d, Prod pid, min(Preco) p FROM `precos` 
        WHERE 1
        group by date(Data), Prod
    ) mp on date(precos.data) = mp.d and precos.Prod = mp.pid and mp.p = precos.Preco
    group by date(Data), Prod
) save_data on save_data.id = precos.id
where 
save_data.id is null
and date(Data) < CURDATE();

I recommend not running in production before some hahaha tests. Good luck

1

I suggest you do the following.

Create a table to assist processing with select below

select produto, date(datetime), min(preco) from tabela group by produto, date(datetime)

This way you will have in a table all the prices you want to stay Then make a left to delete

delete pc from preco pc left join auxiliar aux on pc.produto = aux.produto and date(pc.datetime)=aux.date and pc.preco = aux.preco where aux.produto is null

when you use the left command, anything that has no relation to the other table will be equal to null. This way, when you use the Where product from the auxiliary table equal to null, Voce removes all lines that do not reference.

Before you run DELETE, do SELECT and see if the result is the lines you want to actually delete

Browser other questions tagged

You are not signed in. Login or sign up in order to post.