Optimize Mysql table Insert - Java

Asked

Viewed 570 times

3

I have a loop doing millions of rows sequential Insert in a single Mysql table.

I would like to know if it is possible to parallelize Insert or use some resource that increases insertion performance.

Code:

 public static java.sql.Connection getConexaoMySQL() {       
    //atributo do tipo Connection 
    try { 
        String driverName = "com.mysql.jdbc.Driver"; 
        Class.forName(driverName);  
        String serverName = "localhost";  
        String mydatabase ="tweets"; 
        String url = "jdbc:mysql://" + serverName + "/" + mydatabase; String username = "root";  
        String password = "admin";  
        connection = DriverManager.getConnection(url, username, password); 
        if (connection != null) { 
            status = ("Banco de Dados--->Conectado com sucesso!"); 
        } 
        else { 
            status = ("Banco de Dados--->Não foi possivel realizar conexão"); 
        } 
        return connection; 
    } 
    catch (ClassNotFoundException e) 
    { 
        System.out.println("O driver expecificado nao foi encontrado."); 
        return null; 
    } catch (SQLException e) {  
        System.out.println("Nao foi possivel conectar ao Banco de Dados."); 
        return null; 
    } 
} 

public static void insert(List<TweetsDB> list){
    for (TweetsDB x : list) { 

        preparedStmt.setString (1, x.getCandidate);
        preparedStmt.setString (2, x.getIDTweet);
        preparedStmt.setString (3, x.getIDUser);
        preparedStmt.setString (4, x.getUserScreenName);
        preparedStmt.setString (5, x.getUserName);
        preparedStmt.setString (6, x.getRetweets);
        preparedStmt.setTimestamp(7, x.getDate);
        preparedStmt.setString (8, x.getText);
        preparedStmt.setString (9, x.getHashtags);

        // execute the preparedstatement
        preparedStmt.execute();
    }   
}
  • I am without Mysql here to test and generate a detailed answer for you, but some alternatives are: use INSERT... ON DUPLICATE KEY UPDATE; LOAD DATA INFILE(data from a file, for example. It is the bulk copy); and addBatch, running it later. It necessarily needs to be using JDBC/JPA or you can call on the command line, for example?

  • As Bruno commented, I usually solve these problems (especially when it comes to taking data through spreadsheets/flat file, for example) using LOCAL DATA INFILE.

  • I am just reading from a . csv file and writing to the table. I am also trying to import using LOCAL DATA INFILE or by Workbench. But I am not able to import the data in each column. Example of . csv: https://goo.gl/Fa3Fij

2 answers

2

If you have a looping and are using java can separate the task into threads according to the amount of connection pool Voce can use in the database.

If you need to store the status of each Insert this is the best option.

But if it ensures that there are no errors beyond the use of threads you can create a kind of buffer that sends each time 10k Inserts...


First run the query SHOW VARIABLES;

Look for the connection limit variable, it must be max_connections

This is the limit of connections that the database can handle at the same time.


Then deploy the threads in your java code

final int maxPool = 150;


public void executa(){
    final List<List<String>> listaTodosInserts = divideInserts(maxPool, "arquivoCSV");

    for (List<String> listaInserts : listaTodosInserts ){
        insert(listaInserts);
    }
}


public void insert(final List<String> listaInserts){
    new Thread() {
       public void run() {

           for (String insert : listaInserts){
               // preparedStmt.execute(insert);
           }
       }
    }.start();      
}
  • I edited my question with the code. You know how I could implement these threads?

  • I get it, in case in every routine I have to make a different connection that will correspond to the thread. Right? Are 151 connections possible.

  • Exactly. You can create a method that creates a new thread. and pass the records you want to run by parameter. I’ll try to come up with an example.

  • It will be if you can track the processing of each thread in a different console in Eclipse?

  • If you use Log4j you can save the process in different files;

  • Managed to solve?

  • I implemented a solution using Batch and improved a bit.

Show 2 more comments

0

By Batch and faster because you don’t keep calling run inside the for in each iteration, but preparedStatement.addBatch(); and after the for:

  preparedStatement = conn.prepareStatement(comandoSQL);
    for(ProdutoPrecoVenda produtoPrecoVenda: produto.getProdutosPrecoVenda()) {

        preparedStatement.setQueryTimeout(30);
        preparedStatement.setLong(1, produtoPrecoVenda.getTabelaPreco().getId()); 
//id tabela de preco
        preparedStatement.setLong(2, produto.getId()); //id produto

        preparedStatement.addBatch();

    }
   try {
        preparedStatement.executeBatch();
        preparedStatement.clearBatch();
    } catch (Exception e) {
        System.out.println("ERRO INCLUIR PRODUTO PRECO VENDA:" + produto.getCodigoPrincipal());
        throw e;
    }finally {
        preparedStatement.close();
    }

Browser other questions tagged

You are not signed in. Login or sign up in order to post.