What is the best way to enter a large amount of records into the database?

Asked

Viewed 2,944 times

6

Hello, I am creating an application where I need to constantly consult a spreadsheet, take the data from it and insert in the database (Mysql).

The point is that these spreadsheets will always have at least 55000 (fifty-five thousand) records

What I have to do on every record:

  1. A query to check if it already exists or not in the comic
  2. If it exists I do an UPDATE
  3. If I don’t exist I do an Insert

For now I’m just checking whether or not there is in the bank, and it’s already taking forever, follow the code below:

set_time_limit(0);

include_once '../../db/conexao.php';
include_once '../../ClassesPhpExcel/PHPExcel/IOFactory.php';
$objReader = new PHPExcel_Reader_Excel5();
$caminho = array('C:','Users','brayan','Documents','LN','estrutura_ecn.xls');
$objPHPExcel = $objReader->load(join(DIRECTORY_SEPARATOR, $caminho));
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);

unset($sheetData['1']);

$count = 0;

foreach ($sheetData as $value){

    try {
        $criteria = array(  'select' => 'COUNT(codigo) as codigo',
            'condition'=>'cod_produto ='.$value['A']);

        $existe = Connection::findAllByAttributes('produto', $criteria, false);

        if($existe[0]->codigo == 0){

            //insiro o registro

        }else{

            // faço o update

        }

        $count++;
    }catch (PDOException $e){
        echo $e->message."<br/>";
    }
}

I was wondering if you have any other way to make these insertions and updates more efficiently, and not take so long... ?

From now on I am grateful...

  • Yes. There is a way to make insertions and updates more effectively using operations in Bulk, see this answer: http://answall.com/questions/9332/problemas-com-performant-utilizando-update/9344#9344

3 answers

1

The Bulk quoted by Cantoni (and 2 or 3 levels in links entering what he put) is a way to do much faster, with a little creativity you can do (just idea example) two arrays, one with Insert and the other with update, Insert goes straight and the update goes in a temporary table and then an update with Inner Join.

If you want, before trying Bulk, you can see if this solves your problem: it is possible to use prepared instructions, which are suitable to use repeatedly.

Ex whereas $connection is a class object PDO:

$stmtSel = $connection->prepare("SELECT cod FROM tab WHERE cod = :cod");
$stmtUpd = $connection->prepare("UPDATE tab SET c1 = :c1, c2 = :c2 WHERE cod = :cod");
$stmtIns = $connection->prepare("INSERT INTO tab (cod, c1, c2) VALUES (:cod, :c1, :c2)");

for ($dados as $linha){
  $filtro = array('cod' => $linha[0]);
  $stmtSel->execute($filtro);
  $existe = ($stmtSel->fetch(PDO::FETCH_ASSOC) !== FALSE);
  $stmtSel->closeCursor();
  if ($existe) {
    $valores = array(
      'c1' => $linha[1],
      'c2' => $linha[2],
      'cod' => $linha[0] 
    );
    $stmtUpd->execute($valores);
  } else {
    $valores = array(
      'cod' => $linha[0],
      'c1' => $linha[1],
      'c2' => $linha[2]
    );
    $stmtIns->execute($valores);
  }
}

This considering a table with fields cod, c1 and c2. I can’t remember if the order of the query parameters should be followed, so I put the variable $valores for each if case. The closeCursor is in case the database requires the query to be released before doing another.

This way to execute the instructions prepared with array is my favorite, some prefer to use the bindParam.

If your database makes a lot of use of indexes, you may not be able to improve much. In MySQL tables innoDB, which accept foreign keys, they are usually slower. If you have no problems, switch to MyISAN can make them faster, but no foreign keys.

I worked with a system that had keys made in a way that only worked properly with Oracle, SQL Server was more or less and any other was very slow. There was no way to get better.

References in PHP: http://php.net/manual/en/pdo.prepare.php http://php.net/manual/en/pdostatement.execute.php http://php.net/manual/en/pdostatement.bindparam.php http://php.net/manual/en/pdostatement.fetch.php http://php.net/manual/en/pdostatement.closecursor.php

  • I am giving a studied on the subject before putting into practice, anything I put here whether it worked or not, but since I am grateful ! vlw

0

In the matter of optimization (best way to insert a large amount of data in the MySQL) I believe you could change the engine from your database, as long as it does not require many database controls, as I believe that the values you imported will only be manipulated by code and will probably not be constantly modified.

I have systems that run with 40gb (others with 96 Gb) of data in the database MySQL, but to achieve 'maximum' optimization in large data volumes, I modified the engine for Myisam, by having fewer controls (no rollback and other things) becomes faster the insertion of the data in the database by the engine, as well as the query, let’s say you exchange a database (several foreign key controls etc, InnoDB) by working almost purely with files.

Do the test, with me it was the only solution to have a high performance with large loads.

0

Like a shipment system, we usually import everything and then compare the data. One suggestion I can give you is, create a temporary table in the database just to import all the data at once, without comparing anything, then make the comparison. You can create a database database database to run a query similar to this one:

INSERT IGNORE INTO produto (cod_produto, campo_x, campo_y, campo_z)
SELECT TEMP.cod_produto, TEMP.campo_x, TEMP.campo_y, TEMP.campo_z
FROM    produto_importado as TEMP
LEFT OUTER JOIN produto as TABELA_ATUAL 
ON (TABELA_ATUAL.cod_produto != TEMP.cod_produto)
WHERE  TABELA_ATUAL.cod_produto != '';  

If you prefer to use Prepared Statement, can also do, but then you will have the problem of running PHP, which can lock, I believe it is just set the running time to 0 (infinite):

ini_set('max_execution_time', 0);

$mysqli = new mysqli("localhost", "root", "senha", "seu_banco");
if ($mysqli->connect_errno) {
    echo "Erro de conexão do MySQL: (" . $mysqli->connect_errno . ") " . $mysqli->connect_error;
}

$file_path = "c:\\seu_arquivo.xls";

$SQL = "LOAD DATA INFILE :fileData 
        INSERT INTO TABLE produtos_importado
        FIELDS TERMINATED BY '\t'
        LINES TERMINATED BY '\n'
        (cod_produto, campo_x, campo_y, campo_z)";

if (!($stmt = $mysqli->prepare($SQL))) {
    echo "Prepare falhou: (" . $mysqli->errno . ") " . $mysqli->error;
}
if (!$stmt->bind_param("fileData", $file_path)) {
    echo "Parâmetro falhou: (" . $stmt->errno . ") " . $stmt->error;
}
if (!$stmt->execute()) {
    echo "Execução falhou: (" . $mysqli->errno . ") " . $mysqli->error;
}

You can also do something like this also for Mysql:

Select the data to check:

SELECT * INTO OUTFILE "C:\\planilha.xls" 
     FIELDS TERMINATED BY '\t' 
     LINES TERMINATED BY '\n' 
FROM seu_banco.produto_export;

And you can put it like this in the temporary table:

LOAD DATA LOCAL INFILE "C:\\planilha.xls"
INTO TABLE produto_export
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n' 
IGNORE 1 LINES (cod_produto, campo_x, campo_y, campo_z)

I believe that even excel itself allows you to export/import, take a look here (I’m not sure): http://dev.mysql.com/doc/mysql-for-excel/en/mysql-for-excel-import.html http://dev.mysql.com/doc/mysql-for-excel/en/mysql-for-excel-export.html

  • Hello, thanks for the help, but SELECT OUTFILE and LOAD DATA FILE could not make it work... returns me the following message: Can’t create/write to file. I then created a "temporary" table, where I insert everything inside and then by the bank I do the rest of what I have to do, but it’s still taking about, 1:20:00hrs (an hour and twenty minutes)

  • whether it is not allowed in the file folder? chmod 777 /var/www or in the folder that the file is in.

  • yes yes I checked, but continued the same thing

  • But has improved something? 55,000 is quite given. Especially when you have to check...

  • One thing I would do, is convert the spreadsheet data into CSV format, this would greatly reduce the memory usage to import the data... XLS is much heavier.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.