How to handle large volumes of data in a database?

Asked

Viewed 1,685 times

8

I have a process that I need to implement on my system that, at a given time, will check a relatively large number of data and, if not found, should save in the Mysql database. This should be recorded on the record.

I’m not finding a way to do that without overloading the server. I am developing in PHP and until now the only way I thought was to make a loop reading each "line", checking if it exists, if it does not exist, write, recover the ID and write to another table. If it already exists, I only recover the ID and saved in another table.

This way you would have to make 1 query + 1 write + 1 read (recover the newly saved ID) + 1 write for each record. If we think that it will be common for each operation to do this on average 3000 times, it becomes unviable. In addition it will be common to also have more than one user doing this same process at the same time.

What would be the most correct way to proceed in this case?

[Additional information]

It is a product handling system. Each product has a "serial". Then I need to check each serial in table "A" and if it doesn’t exist, I register, I take the ID and I throw in table "B". If the serial is already registered in table "A", I just throw its serial in table "B".

inserir a descrição da imagem aqui

  • 1

    Have you considered the possibility of using Events, Procedures and Triggers so you avoid this redundancy, leaving the whole rule in the database and away from PHP!

  • I believe there may be an optimization for your system. If it checks if there is a certain ID in a table, it is because that ID has to be in another table, otherwise it would not know what to check. So, why not register the new Ids that appear also in the table where it is necessary to check?!

  • @Diegomachado, this is exactly the process: I check if the ID exists in table "A", if it exists take the ID (from table "A") and insert in table "B". If it does not exist, insert the record in table "A", retrieve the ID and insert it in table "B". These are product movement contents. Each product has a serial. So I need to check serial to serial. If the serial is already registered (table A), I take the ID and loop in table B. And so on, as already explained.

  • I believe that you do not understand what I said, what I am trying to do is try to skip this verification process, but if it is not possible ok, we will see the other way. I wanted to say the following: Why add the records in only one table and then have to check everything to see if it exists in the other one too? I do not know how the system was made but I believe that it would be easier if the records were already added in the two tables at once, because since the verification is done automatically then there would be no problem in registering there once, understood what I meant.

  • @I don’t think I really understand you. The serial, which must be unique, will be written to only one table and its ID that must be moved. The system already exists today and works the way (I think) I understood its explanation: I write the serial in more than one table. I am redesigning the system to avoid this redundancy. I added a figure to the statement to illustrate what I need. In fact, with all that, I just want to know one way to avoid a line query in a loop that can have thousands of repetitions.

  • I still can’t get a practical idea of how to solve this problem. Can anyone give me a direction?

  • The term you seek will not be Transações ?

  • I think not, because I need to pass values from PHP to the database. A loop in the data to do the search. However, do you have any transaction suggestions that apply in this case?

  • As @Joker mentioned, the Events and Triggers They serve exactly this, working with heavy and repetitive processes. In the case of 3000 queries for each record is really heavy, you should then find a way to process the data before and only then enter.

Show 4 more comments

3 answers

5


I think that you should not, nor should, be recording the SERIAL, in the Input and Output tables, in this case to do the Insert you would need to do using a Stored Procedure, because you would be passing information to the two-table database.

Then the solution I propose:

Tables:

CREATE TABLE `testes`.`produto` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `descricao` VARCHAR(45) NULL,
  `serial` BIGINT NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE INDEX `serial_UNIQUE` (`serial` ASC));

CREATE TABLE `testes`.`entrada` (
  `identrada` INT NOT NULL AUTO_INCREMENT,
  `id_produto` INT NULL,
  `data` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`identrada`),
  INDEX `fk_produto_idx` (`id_produto` ASC),
  CONSTRAINT `fk_produto`
    FOREIGN KEY (`id_produto`)
    REFERENCES `testes`.`produto` (`id`)
    ON DELETE SET NULL
    ON UPDATE NO ACTION);

Stored Procedure:

DELIMITER $$

CREATE DEFINER=`root`@`localhost` PROCEDURE `grava_item_entrada`(in serial_produto bigint(20))
BEGIN
    DECLARE ID_PROD INT DEFAULT NULL;

    SET ID_PROD := (SELECT P.ID FROM PRODUTO P WHERE P.SERIAL = serial_produto);

    IF (ID_PROD IS NULL ) THEN
        INSERT INTO PRODUTO (SERIAL) VALUE (serial_produto);
        SET ID_PROD := LAST_INSERT_ID();
    END IF;

    INSERT INTO ENTRADA (ID_PRODUTO) VALUES (ID_PROD);  
END

Just call the application by PHP that should work, for example.

$mysqli->query("CALL grava_item_entrada(@serial)");
  • 1

    Excellent suggestion! I’m going to make a few minor adjustments to suit my reality, but I think this will solve my situation - or at least ease the server’s resources a little.

4

Your problem appears to be non-sscalable, that is, if all the records you have in the table have different keys, there is no way to optimize this search, because this is already the competence of the database and its algorithms of decision, search, insertion and etc.

Unless you do the partitioning from your table. This can be a good way out when there is no way to create more specific and optimisable indexes, which seems to me to be your case. With partitioning the tables will be smaller, and will always mirror each other which makes it easier to find the data you are looking for in a large volume of data.

3

I will give a more elegant, simple and easy solution, first creating the tables:

CREATE TABLE `tab_produto` (
  `id_produto` int(11) NOT NULL AUTO_INCREMENT,
  `serial` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id_produto`),
  UNIQUE KEY `serial_UNIQUE` (`serial`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

  CREATE TABLE `tab_entrada` (
  `id_entrada` int(11) NOT NULL AUTO_INCREMENT,
  `id_produto` int(11) DEFAULT NULL,
  `dt_emissao` date DEFAULT NULL,
  PRIMARY KEY (`id_entrada`),
  KEY `fk_tab_produto_idx` (`id_produto`),
  CONSTRAINT `fk_produto` FOREIGN KEY (`id_produto`) REFERENCES `tab_produto` (`id_produto`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;  

Well the important thing so far is UNIQUE in the field serial, That will ensure you won’t have duplicate products. The relationship issue I think is not necessary to explain why it is believed that you already have the understanding of why it is necessary.

Now the way to make the Inserts see how simple it is:

insert ignore tab_produto (serial) values(123);

insert into tab_entrada (id_produto,dt_emissao) values ( (select id_produto from tab_produto where serial=123), curdate() );

Assuming you wanted to insert the serial 123, first you do the INSERT IGNORE if it already exists it will simply ignore or it will not generate error, and will not duplicate because the field is UNIQUE.

Then comes the Insert in the input table (the same would be for the output table) does the INSERT already consulting the same serial that was passed in the first Insert, the balcony here’s what you do INSERT with SELECT at once

I’ll leave a Sqlfiddle, note that until "forced" a duplicated Insert in the tab_product for the test to be close to what can happen in your environment, and then launched 2 records in the tab_input only for demo same.


There is to think of a solution where the client does only 1 input (or output) and with a Trigger in the before you would do the Insert in the tab_product.

This saves time compared to the->mysql client, but I think the cost for BD. would be the same.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.