Problem reading file . xlsx too big in Java

Asked

Viewed 1,877 times

6

I am working in a web application using Java, where I have a method that should read a file .xlsx, using the apache-poi:

public static void xlsx(String arquivo) throws IOException{

     try {
        FileInputStream file = new FileInputStream(new File(arquivo));
        XSSFWorkbook workbook = new XSSFWorkbook(file);
        XSSFSheet sheet = workbook.getSheetAt(0);
        Iterator<Row> rowIterator = sheet.iterator();
        while (rowIterator.hasNext()) {
            Row row = rowIterator.next();
            Iterator<Cell> cellIterator = row.cellIterator();
            while (cellIterator.hasNext()) {
                Cell celula = cellIterator.next();
                /*aqui faço a leitura de cada
                celula, fazendo o tratamento adequado 
                a cada campo.
                */
            }
        }
        file.close();

    } catch (IOException e) {
        e.printStackTrace();
        throw new IOException("Erro ao processar arquivo.",e.getCause());
    }
}

The method works correctly, however as this method will probably process files with thousands of rows of records, for example, about 25 to 300 thousand lines. While processing a very large file I take the following Exception:

(http-localhost-127.0.0.1-8080-4) Servlet.service() for servlet RestServlet threw exception: org.jboss.resteasy.spi.UnhandledException: java.lang.OutOfMemoryError: Java heap space

I wanted to know how I can avoid this type of error. If you have how to read and process the file. xlsx of 1000 in 1000 lines, or some other solution.

  • What memory is JVM using? Try increasing : java -Xmx6g seuprograma (this calls the JVM with 6 Gigabytes heap)

  • @Kyllopardiun man, thanks for the help and I believe that what you told me worked, I set up the JVM as you indicated, but my dev machine has only 4gb of ram. I believe that memory will not be the problem when the module is for production, but I cannot test and guarantee. Do you know if it is possible to read the file in blocks? For example go reading every thousand lines? If you have any references, thank you.

  • 1

    Read block files in blocks exists yes, it is a "file mapped in memory" (Memory Mapped File), I suggest you search about this, here for example: http://javarevisited.blogspot.com.br/2012/01/memorymapped-file-and-io-in-java.html, you will also need an implementation to read these blocks (maybe break them into groups of tags resolve if it is a single array)

  • Thank you Leornardobosquett. I will search the reference you sent me. Thanks!

1 answer

3


1 Read the full file:

Increase the heap size of the JVM:

java Xmx80m seuprograma //80 megabytes (só para mostrar a sintaxe)
java -Xmx6g seuprograma // 6 gigabytes

Note that JVM is limited to approximately 2GB on computers with 32bit architecture

2 Wide input and partial reading

To handle a wide and short-term input the best option I know is the apache SXSSF.

SXSSFWorkbook wb = new SXSSFWorkbook(-1); // cancela o "flushing" automatico e leitura do arquivo completo
// Depois faça o controle manual:
if(rownum % NOR == 0) {
((SXSSFSheet)sh).flushRows(NOR); // mantém NOR linhas e "descarta" o restante
//...

For more details I suggest you see and understand the example on link posted above.

  • Thanks for your help, I will test this implementation with SXSSF.

  • can tell me how I can do this, also with csv files?

  • @Ericosouza Did that answer solve your problem? As far as I know, Sxssfworkbook is designed for streaming writing, not streaming reading. The very link provided in the reply only talks about writing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.