What is the best strategy to load and persist large volumes of data with Spring?

Asked

Viewed 419 times

1

I need to perform a function in a Spring and Hibernate project, to which I must update some information from all records of my table. The idea is to load the records to the application, process the data based on some values of these records and then persisistir all this data mass.

The values to be updated are different for each record, and depend on the data already persisted in each one.

The table contains about 200,000 records, and I would like to know what is the best strategy for me to be able to carry out this data load, processing and persistence without this generating bottlenecks in my application and database.

  • Do you really need to load the crates into the application, process them, and upgrade? Wouldn’t you, for example, have a Procedure that makes this update, at most being called by the application?

  • In this case not, due to the business rule that I must execute in this update, which would generate a very complex query. I will need to use relationships, date operations, splits, interactions...

3 answers

4


Spring has a suitable tool for this, called Spring Batch.

In a free translation:

Many applications in the corporate environment require mass data processing to perform business operations in mission-critical environments. These operations include automated and complex processing of large volumes of information processed most of the time without user interaction. These operations typically include those based on events (e.g., month-end calculations, notices, etc.), periodic application of complex business rules repeatedly processed over a large amount of data (e.g., determination of insurance benefits, custom promotions) of internal and external systems that typically require formatting, validation, and record processing.

We recently used this tool in a project to migrate data from one system to another. We load a large volume of data through data files exported from the old database and perform various processing on top of them: correcting information, seeing what information already exists in the database, taking images from one directory and moving to another, inserting the information into the current database, saving in a log file the problematic records, etc.

At first it can scare, but it’s very simple to use.

  • 1

    It scared me. I think I’ll only be quiet with it when using, when building an application with it

0

Another strategy would be you bring this paginated data to memory (bring the data to java, to do all the processing), then update them and then make a batch Insert in another table. And the old table you can drop. This process is faster than a bank update. In case you want to do this process I recommend taking a backup of the table that will be deleted. If you don’t want to mess with a programming language, there are data processing tools like PENTAHO (It does much more than data processing). I hope it helps. :)

-1

In this case when you have many logs to show on the screen, you can use the paging that has control of how many you want to show in a GET Example of an application I made:

On the controller:

@RequestMapping(value = "/page", method = RequestMethod.GET)

    public ResponseEntity<Page<CategoriaDTO>> findAll(
            @RequestParam(value="page", defaultValue="0") Integer page,`insira o código aqui`
            @RequestParam(value="linesPerPAge", defaultValue="24") Integer linesPerPAge,
            @RequestParam(value="orderBy", defaultValue="nome") String orderBy,
            @RequestParam(value="direction", defaultValue="ASC") String direction) {
        Page<Categoria> list = service.findPage(page, linesPerPAge, orderBy, direction);
        Page<CategoriaDTO> listDTO = list.map(obj -> new CategoriaDTO(obj));

        return ResponseEntity.ok().body(listDTO);
    }

Na Service:

public Page<Categoria> findPage(Integer page, Integer linesPerPAge, String orderBy, String direction){

        PageRequest pageRequest = PageRequest.of(page, linesPerPAge, Direction.valueOf(direction), orderBy);
        return repo.findAll(pageRequest);
    } 

Nothing needs to be done at Repository!

  • You talk about screen loading, but the focus of the issue is persistence in batch. The idea of AP loading (in your words) is to allow application-side data processing, not DBMS. So what does your answer add to the problem provided, or to your context?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.