Faster delete + Insert or select + update

Asked

Viewed 671 times

4

I am building an app that will receive data from various different ERP’s every day.

But I have the following problem, the next day the ERP can send me the same data that had or did not changes + some new data. And I need to enter these new data and update the rest in case there is any change, and delete the old data that will no longer be used.

There are about 15 thousand records per day of each company, and I intend that this application is used by +- about 200 companies in media.

When I think about it, when I receive this information daily, I’d better go checking one by one to see if there were any changes and if so update them( select + update ).

Or every day I delete all information and enter again ( delete + Insert ).

What’s the fastest way to handle big information every day? If someone has similar experience and knows a different solution, I am also interested.

Thank you!

  • I think the most appropriate thing is to implement both alternatives and test performance. It may be that one solution is more suitable in one situation but not in another.

  • Are these +- 15,000 records all received at any given time or will this occur throughout the day? Will the system be accessible during the update? Can the update take place outside office hours? In my experience the 15 thousand records are not a problem to be considered, but the size of the database you will update and your expectation of growth of this database yes! I think you could explain in more detail the context of your problem so that it is possible to offer a response/suggestion.

  • Instead of doing a select for later update, use the method ON DUPLICATE KEY UPDATE wouldn’t it be better?? I think I should take a performance test first, but I believe I would gain more speed without having to select. Because that way he would always try to insert new records, and would update only those that already exist in the database. Take a look: https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html.

  • I am also making a system like this and I would like to know which is the most efficient for tomorrow not to have problems, I have seen that delete can have a high cost for indexed tables, I was thinking about putting an N/C Normal and Canceled Record flag and updating the whole table instead of deleting, but then I fell for two other questions, is Update faster, and the swelling in the tables, by keeping unused records, did not affect select? I thought of this solution taking into account that there are tables with millions of records and the performance is good.

  • Use the INSERT ... ON DUPLICATE KEY UPDATE who does this work for you.

2 answers

1

My dear, I have some remarks for you:

  1. I do not recommend deleting the data, because if some day there is failure or delay in sending the data by the company, you will have an unavailable data window or inconsistent with the information of the previous day.
  2. I think you need to specify better how you know that the new data that comes in is new or repeated. That is, what the data identifier is. Ex: id_empresa + id_tipo_informacao + data_informacao.
  3. After answering the previous question, you can make ONE TEST and change the table primary key to a composite key using the field combination that results in a single combination and use the Mysql REPLACE command, that will do this INSERT or UPDATE task automatically for you.
  • This does not provide an answer to the question. To criticize or request clarification from an author, leave a comment below its publication. - Of Revision

  • Thanks for the feedback. What should I do then? Can I delete my reply?

  • Marcio all right? It is not necessary to remove your answer. This was my position in the review, but other users will also analyze your response. If it is satisfactory, it will remain, if it is not, it will be automatically removed.

-2

This is a very common procedure used by charging offices, for transmitting large customer lists. We call this "Charge".

Every day two different systems can talk to each other sending and receiving this data because one of them is running an attempt to synchronize with the source database. The procedure is as follows::

The database that attempts to synchronize, must have two identical tables in its layout, one for input data and one for history data. Both must have all the necessary fields for the information and two more control fields: The first is the name of the data origin, example: "empresa_origin". The second field is the data entry date, example: "data_input". The date field will act as a comparison parameter between the history table data and the new data with data.

The routine that should be used is to insert the records in the "history table" entry table, and then perform a comparison with the history table in order to separate the new content. This way it will be possible to differentiate the records.

Data that are located different should be inserted within the history table and after the procedure the input table should be cleared for a new data load.

Using the fields "company name" and "data_input" it is possible to differentiate the data from all sources and thus obtain greater integrity of the information received. Thus you get a report of "Load" through the quantities of records that are received daily for each origin and date.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.