Reading files quickly
In Java, there are several classes for reading files, with and without buffering, random access, thread-safe, and memory mapping. Some of these are much faster than others.
FileInputStream
byte-readable
The FileInputStream
opens a file by name or object File
. The method read()
reads byte after byte of the file.
FileInputStream
uses synchronization to make it thread-safe.
FileInputStream f = new FileInputStream(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
checkSum += b;
}
FileInputStream
with byte array reading
The FileInputStream
does an I/O operation on each read and it synchronizes on all method calls to make it thread-safe. To reduce this overhead, you can read multiple bytes at once in an array of buffer bytes.
FileInputStream f = new FileInputStream(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1)
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
}
BufferedInputStream
byte-readable
The BufferedInputStream
deals with the FileInputStream
doing the buffer for you. He does the wrap of the entrance of stream, creates an internal byte array (usually 8 KB), and fills it to read. The method read()
takes every byte of buffer.
BufferedInputStream
uses synchronization to be thread-safe.
BufferedInputStream f = new BufferedInputStream(
new FileInputStream(name));
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
checkSum += b;
}
BufferedInputStream
with byte array reading
BufferedInputStream
synchronizes all methods when making thread-safe calls. To reduce synchronization and overhead from calls to the method, make fewer calls to the method read()
reading multiple bytes at once.
BufferedInputStream f = new BufferedInputStream(
new FileInputStream(name));
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
}
RandomAccessFile
byte-readable
RandomAccessFile
opens the file by name or object File
. It can read, write, or read and write by the position you choose within the file. The method read()
reads the next byte of the current file position.
RandomAccessFile
is thread-safe.
RandomAccessFile f = new RandomAccessFile(name);
int b;
long checkSum = 0L;
while ((b = f.read()) != -1) {
checkSum += b;
}
RandomAccessFile
with byte array reading
Just as FileInputStream
, RandomAccessFile
faces the problem of performing an I/O operation on every access and synchronization on every call to methods to be thread-safe. To reduce this bottleneck, you can make fewer calls to methods by passing bytes to an array and reading from the array.
RandomAccessFile f = new RandomAccessFile(name);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead;
while ((nRead = f.read(barray, 0, SIZE)) != -1) {
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
}
FileChannel
with ByteBuffer
and byte search
FileInputStream
and RandomAccessFile
can return a FileChannel
for lower level operations with I/O. The method read()
of FileChannel
fills a ByteBuffer
created using the method allocate()
class ByteBuffer
. The method get()
class ByteBuffer
recovers the next byte of buffer.
FileChannel
and ByteBuffer
are not thread-safe.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
while (bb.hasRemaining()) {
checkSum += bb.get( );
}
bb.clear();
}
FileChannel
with ByteBuffer
and byte array search
To reduce the method call bottleneck one byte at a time, retrieve one byte array at a time. The array and the ByteBuffer
may have different sizes.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
while(bb.hasRemaining()) {
nGet = Math.min(bb.remaining(), SIZE);
bb.get(barray, 0, nGet);
for (int i = 0; i < nGet; i++) {
checkSum += barray[i];
}
}
bb.clear( );
}
FileChannel
with array of ByteBuffer
and access to byte array
A ByteBuffer
created with the method allocate()
uses Storage internal to store bytes. Instead of using this strategy, call the method wrap()
to make a wrap of ByteBuffer
wrapped in its own byte array. This allows the array to be accessed directly after each read, reducing the bottleneck by the method call and data copy.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
byte[] barray = new byte[SIZE];
ByteBuffer bb = ByteBuffer.wrap(barray);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
for (int i = 0; i < nRead; i++) {
checkSum += barray[i];
}
bb.clear();
}
FileChannel
with direct allocation of ByteBuffer
A ByteBuffer
created with the method allocateDirect()
can directly use the Storage on the JVM or machine operating system. This can reduce the data copy to your application array, avoiding some overhead.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(SIZE);
long checkSum = 0L;
int nRead;
while ((nRead = ch.read(bb)) != -1) {
bb.position(0);
bb.limit(nRead);
while (bb.hasRemaining()) {
checkSum += bb.get( );
}
bb.clear();
}
FileChannel
with direct allocation of ByteBuffer
and search by byte array
And of course, you can recover byte arrays to reduce the overhead in calling method. The size of the buffer can be different from array size.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(BIGSIZE);
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nRead, nGet;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
while(bb.hasRemaining()) {
nGet = Math.min(bb.remaining(), SIZE);
bb.get(barray, 0, nGet);
for (int i = 0; i < nGet; i++) {
checkSum += barray[i];
}
}
bb.clear();
}
FileChannel
with MappedByteBuffer
and recovering with bytes
The class method FileChannel
, map, can return a MappedByteBuffer
which stores part or all of the file in the application’s memory space. This allows more direct access to the file without an intermediate buffer. Call the method get()
class MappedByteBuffer
to recover the next byte.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
0L, ch.size());
long checkSum = 0L;
while (mb.hasRemaining()) {
checkSum += mb.get();
}
FileChannel
with MappedByteBuffer
and byte array reading
And recover byte arrays to decrease the method overload.
FileInputStream f = new FileInputStream(name);
FileChannel ch = f.getChannel();
MappedByteBuffer mb = ch.map(ch.MapMode.READ_ONLY,
0L, ch.size());
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nGet;
while (mb.hasRemaining()) {
nGet = Math.min(mb.remaining(), SIZE);
mb.get(barray, 0, nGet);
for (int i = 0; i < nGet; i++) {
checkSum += barray[i];
}
}
FileReader
and BufferedReader
The two classes read characters instead of bytes. For this reason they need to transform the bytes into characters, taking longer than any of the strategies shown above.
Faster
If we were to pick the fastest strategy, it would be one of those:
FileChannel
with MappedByteBuffer
and byte array reading.
FileChannel
with direct allocation of ByteBuffer
and search by byte array.