What is the best (fastest) way to read a file from a web server?

Asked

Viewed 2,293 times

9

I need to read a file on a web server, but when I need to store the content in an array of bytes it is taking too long. Does anyone know a faster way to do this? Follow my code. Thanks in advance.

try {
        url = new URL(surl);

        urlConnection = (HttpURLConnection) url.openConnection();
        InputStream input = new BufferedInputStream(urlConnection.getInputStream());    
        int b = input.read();
        List<Byte> bytes = new LinkedList<Byte>();
        while (b != -1) {
            bytes.add((byte) b);
            b = in.read();
        }
        byte[] array = new byte[bytes.size()];


        //AQUI ESTÁ O PROBLEMA, ESTÁ DEMORANDO MUITO!
        for (int i = 0; i < bytes.size(); i++) {
            array[i] = bytes.get(i).byteValue();
        }


        String str = new String(array);
        myreturn = str;

    }

2 answers

11


Reading files quickly

In Java, there are several classes for reading files, with and without buffering, random access, thread-safe, and memory mapping. Some of these are much faster than others.

FileInputStream byte-readable

The FileInputStream opens a file by name or object File. The method read() reads byte after byte of the file.

FileInputStream uses synchronization to make it thread-safe.

FileInputStream f = new FileInputStream(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
    checkSum += b;
}

FileInputStream with byte array reading

The FileInputStream does an I/O operation on each read and it synchronizes on all method calls to make it thread-safe. To reduce this overhead, you can read multiple bytes at once in an array of buffer bytes.

FileInputStream f = new FileInputStream(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1)
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
 }

BufferedInputStream byte-readable

The BufferedInputStream deals with the FileInputStream doing the buffer for you. He does the wrap of the entrance of stream, creates an internal byte array (usually 8 KB), and fills it to read. The method read() takes every byte of buffer.

BufferedInputStream uses synchronization to be thread-safe.

BufferedInputStream f = new BufferedInputStream(
    new FileInputStream(name));
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
    checkSum += b;
}

BufferedInputStream with byte array reading

BufferedInputStream synchronizes all methods when making thread-safe calls. To reduce synchronization and overhead from calls to the method, make fewer calls to the method read() reading multiple bytes at once.

BufferedInputStream f = new BufferedInputStream(
    new FileInputStream(name));
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
}

RandomAccessFile byte-readable

RandomAccessFile opens the file by name or object File. It can read, write, or read and write by the position you choose within the file. The method read() reads the next byte of the current file position.

RandomAccessFile is thread-safe.

RandomAccessFile f = new RandomAccessFile(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
    checkSum += b;
}

RandomAccessFile with byte array reading

Just as FileInputStream, RandomAccessFile faces the problem of performing an I/O operation on every access and synchronization on every call to methods to be thread-safe. To reduce this bottleneck, you can make fewer calls to methods by passing bytes to an array and reading from the array.

RandomAccessFile f = new RandomAccessFile(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
}

FileChannel with ByteBuffer and byte search

FileInputStream and RandomAccessFile can return a FileChannel for lower level operations with I/O. The method read() of FileChannel fills a ByteBuffer created using the method allocate() class ByteBuffer. The method get() class ByteBuffer recovers the next byte of buffer.

FileChannel and ByteBuffer are not thread-safe.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
    if (nRead == 0) {
        continue;
    }
    bb.position(0);
    bb.limit(nRead);
    while (bb.hasRemaining()) {
        checkSum += bb.get( );
     }
    bb.clear();
}

FileChannel with ByteBuffer and byte array search

To reduce the method call bottleneck one byte at a time, retrieve one byte array at a time. The array and the ByteBuffer may have different sizes.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
    if (nRead == 0) {
        continue;
    }
    bb.position(0);
    bb.limit(nRead);
    while(bb.hasRemaining()) {
        nGet = Math.min(bb.remaining(), SIZE);
        bb.get(barray, 0, nGet);
        for (int i = 0; i < nGet; i++) {
            checkSum += barray[i];
        }
    }
    bb.clear( );
}

FileChannel with array of ByteBuffer and access to byte array

A ByteBuffer created with the method allocate() uses Storage internal to store bytes. Instead of using this strategy, call the method wrap() to make a wrap of ByteBuffer wrapped in its own byte array. This allows the array to be accessed directly after each read, reducing the bottleneck by the method call and data copy.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
byte[] barray = new byte[SIZE];
ByteBuffer bb = ByteBuffer.wrap(barray);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
    for (int i = 0; i < nRead; i++) {
        checkSum += barray[i];
    }
    bb.clear();
}

FileChannel with direct allocation of ByteBuffer

A ByteBuffer created with the method allocateDirect() can directly use the Storage on the JVM or machine operating system. This can reduce the data copy to your application array, avoiding some overhead.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
    bb.position(0);
    bb.limit(nRead);
    while (bb.hasRemaining()) {
        checkSum += bb.get( );
    }
    bb.clear();
}

FileChannel with direct allocation of ByteBuffer and search by byte array

And of course, you can recover byte arrays to reduce the overhead in calling method. The size of the buffer can be different from array size.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
    if (nRead == 0) {
        continue;
    }
    bb.position(0);
    bb.limit(nRead);
    while(bb.hasRemaining()) {
        nGet = Math.min(bb.remaining(), SIZE);
        bb.get(barray, 0, nGet);
        for (int i = 0; i < nGet; i++) {
            checkSum += barray[i];
        }
    }
    bb.clear();
}

FileChannel with MappedByteBuffer and recovering with bytes

The class method FileChannel, map, can return a MappedByteBuffer which stores part or all of the file in the application’s memory space. This allows more direct access to the file without an intermediate buffer. Call the method get() class MappedByteBuffer to recover the next byte.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
    0L, ch.size());
long checkSum = 0L;
while (mb.hasRemaining()) {
    checkSum += mb.get();
}

FileChannel with MappedByteBuffer and byte array reading

And recover byte arrays to decrease the method overload.

FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
    0L, ch.size());
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nGet;
while (mb.hasRemaining()) {
    nGet = Math.min(mb.remaining(), SIZE);
    mb.get(barray, 0, nGet);
    for (int i = 0; i < nGet; i++) {
        checkSum += barray[i];
    }
}

FileReader and BufferedReader

The two classes read characters instead of bytes. For this reason they need to transform the bytes into characters, taking longer than any of the strategies shown above.

Faster

If we were to pick the fastest strategy, it would be one of those:

  • FileChannel with MappedByteBuffer and byte array reading.
  • FileChannel with direct allocation of ByteBuffer and search by byte array.

4

TL;DR

The fastest way depends on the goal of the program. If the idea is to load everything in memory, just use a more efficient method.

Reading file in a String

The fastest way I know to upload a local file to a String in memory is as simple as that:

String conteudo = new String(Files.readAllBytes(Paths.get("meu.txt")));

However, this does not work for remote files accessed via HTTP.

Reading URL in a String

In this case, the fastest method is to continue using the InputStream and a better method to read bytes.

As reported in other places, the most efficient way is using the method sun.misc.IOUtils.readFully(), thus:

    InputStream input = new URL("http://www.textfiles.com/humor/mel.txt").openStream();
    String conteudo = new String(IOUtils.readFully(input, -1, true));

Risks and alternatives

Of course, using an internal implementation of a proprietary JDK is not always a good idea. The method may change or cease to exist in some future version.

The good news is that it is easy to replace with an alternative. One of them is the Apache Commons IO library, whose method IOUtils.toString() also does the job in one step:

String conteudo = IOUtils.toString(input, "UTF-8");

The Google Guava library also does something similar in the method ByteStreama.toByteArray():

String conteudo = new String(ByteStreams.toByteArray(input]));

In Java 9 no additional code will be required as the class InputStream will be provided with new methods for bulk byte copy.

Considerations

First, there is no need to use exactly the fastest method, because surely the performance bottleneck will end up being the download of the file. So I would recommend using a library and not the fastest method using the internal library.

Second, the current implementation is slow because it is making inefficient use of resources, reading everything in a list and copying everything again in an array and then all over again in an array String. Are at least 3 times more memory than necessary.

Third, often we don’t need to load the entire file in memory. If soon after this routine you record the content in a file, it would be more efficient to read and write at the same time. A very simple way is to use routine IOUtils.copy of the Apache library.

And watch out for imports, because several libraries have classes called IOUtils. In this example alone we saw two.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.