29
I wrote this little program in Java to download images of a thread in a imageboard:
public class FourChanThreadImageDownloader {
private static void usage() {
System.out.println("java FourChanThreadImageDownloader <url> <folder>");
System.exit(1);
}
public static void main(String[] args) {
if(args.length < 2)
usage();
final String url = args[0];
final String targetDirName = args[1];
final Pattern imageUrlSyntax =
Pattern.compile("(//i\\.4cdn\\.org/\\w+/\\d+\\.(?:jpg|webm|gif|png))");
boolean successfull = false;
final File targetDir = new File(targetDirName);
if(!targetDir.exists()) {
System.out.println("Creating destination directory");
if(!targetDir.mkdir()) {
System.out.println("Could not create target directory");
System.exit(1);
}
} else if(!targetDir.canWrite()) {
System.out.println("Cannot put downloaded images inside destination folder:" +
" you have not permission to write in this directory.");
System.exit(1);
}
try {
final URL fourChan = new URL(url);
final Reader inputReader = new InputStreamReader(fourChan.openStream());
System.out.println("Connecting OK, trying to download images");
final BufferedReader bufReader = new BufferedReader(inputReader);
final StringBuilder pageContent = new StringBuilder();
int c;
while((c = bufReader.read()) != -1)
pageContent.append((char) c);
final Matcher finder = imageUrlSyntax.matcher(pageContent.toString());
while(finder.find()) {
final String currImage = "http:" + finder.group();
final String imageName = currImage.split("/")[4];
final BufferedInputStream targetStream =
new BufferedInputStream(new URL(currImage).openStream());
final ByteArrayOutputStream recipient = new ByteArrayOutputStream();
System.out.println("Downloading: " + currImage);
int d;
int notifyCounter = 0;
while((d = targetStream.read()) != -1) {
recipient.write(d);
if(notifyCounter == 8196) {
System.out.print('.');
notifyCounter = 0;
}
notifyCounter++;
}
System.out.println("\nImage successfully downloaded");
final File nextImage = new File(targetDir, imageName);
final BufferedOutputStream image =
new BufferedOutputStream(new FileOutputStream(nextImage));
System.out.println("Saving image in " + nextImage.toString());
image.write(recipient.toByteArray());
image.close();
recipient.close();
successfull = true;
}
} catch(MalformedURLException e) {
System.out.println("Mistyped URL");
successfull = false;
} catch(IOException e) {
if(e instanceof FileNotFoundException) {
System.err.println("Cannot download images: Thread not found (404)");
return;
}
System.err.println("An error occurred: " + e.getMessage());
successfull = false;
}
if(successfull)
System.out.println("Success");
}
}
So far so good, it works perfectly. But the problem is: My task manager says that the Java process that runs this program consumes around 80MiB of memory (oscillates between that and even 110MiB).
Why Java consumes so much memory?
How to avoid super-memory consumption in Java?
Best practices for more efficient use of memory in Java?
What is in my code that causes such a high memory consumption?
This consumption for me doesn’t seem normal, since I noticed that other programs in Java "big ones" as Apache Tomcat, I2P and Elasticsearch, when running without anyone using them maintains a constant consumption of 100MiB.
Edit:
Recently I made small changes (in fact, optimizations, let’s say) that miraculously improve the memory usage of my little program. Now it consumes around 30MiB and 60MiB. And at first, the changes are simple:
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FourChanThreadImageDownloader {
private static void usage() {
System.out.println("java FourChanThreadImageDownloader <url> <folder>");
System.exit(1);
}
public static void main(String[] args) {
if(args.length < 2)
usage();
final String url = args[0];
final String targetDirName = args[1];
final Pattern imageUrlSyntax =
Pattern.compile("(//i\\.4cdn\\.org/\\w+/\\d+\\.(?:jpg|webm|gif|png))");
boolean successfull = false;
final File targetDir = new File(targetDirName);
if(!targetDir.exists()) {
System.out.println("Creating destination directory");
if(!targetDir.mkdir()) {
System.out.println("Could not create target directory");
System.exit(1);
}
} else if(!targetDir.canWrite()) {
System.out.println("Cannot put downloaded images inside destination folder:" +
" you have not permission to write in this directory.");
System.exit(1);
}
final URL fourChan;
try {
fourChan = new URL(url);
} catch(MalformedURLException e) {
System.err.println("Mistyped URL");
return;
}
try(final Reader inputReader = new InputStreamReader(fourChan.openStream());
final BufferedReader bufReader = new BufferedReader(inputReader)) {
System.out.println("Connecting OK, trying to download images");
final StringBuilder pageContent = new StringBuilder();
int c;
while((c = bufReader.read()) != -1)
pageContent.append((char) c);
final Matcher finder = imageUrlSyntax.matcher(pageContent.toString());
while(finder.find()) {
final String currImage = "http:" + finder.group();
final String imageName = currImage.split("/")[4];
try(final BufferedInputStream targetStream =
new BufferedInputStream(new URL(currImage).openStream());
final ByteArrayOutputStream recipient = new ByteArrayOutputStream()) {
System.out.println("Downloading: " + currImage);
int d;
int notifyCounter = 0;
while((d = targetStream.read()) != -1) {
recipient.write(d);
if(notifyCounter == 8196) {
System.out.print('.');
notifyCounter = 0;
}
notifyCounter++;
}
System.out.println("\nImage successfully downloaded");
final File nextImage = new File(targetDir, imageName);
final BufferedOutputStream image =
new BufferedOutputStream(new FileOutputStream(nextImage));
System.out.println("Saving image in " + nextImage.toString());
recipient.writeTo(image);
image.flush();
image.close();
successfull = true;
}
}
} catch(IOException e) {
if(e instanceof FileNotFoundException)
System.out.println("Cannot download images: Thread not found (404)");
else
System.out.println("An error occurred: " + e.getLocalizedMessage());
successfull = false;
}
if(successfull)
System.out.println("Success");
}
}
It is not very different from the original program, the changes were:
- Buffers and streams were thrown into Try-with-Resources to ensure closure.
- The
toByteArray()
was replaced by awriteTo()
to prevent the object from copying the entire image and then writing to the file. - The stream that writes the image to the file is properly closed and a
flush()
to ensure the operation.
Despite the answers here, the reason my little program consumed so much memory was this: Bad programming. I didn’t use the proper methods and I didn’t try to manage the resources I use correctly, which doesn’t occur in the code above. What caused the program to fill the memory with images that have already been downloaded and thousands of unopened buffers, in addition to the use of expensive methods such as toByteArray()
, as Maniero pointed out. Basically: Several memory Leaks every iteration of loop.
Well, the lesson of this question is more or less the following: know the standard library of your language, know data structures, algorithms and be aware of what happens "behind the scenes" of your code. In this case, my mistake was to let memory Leaks happen.
Java is not meant to create programs that use memory efficiently (Java is not C, let’s say), but this is no reason to program as if memory were infinite as it is no reason to try miracles. It is important to recognize when a high consumption case is strange and still keep in mind good practices.
The Java Virtual Machine (JVM) itself is already quite big, and also the garbage collector (Garbage Collector - Java GC) only "recycles" memory when system memory is missing. So, things you’ve done and already discarded (these buffers of I/O) can still exist in memory for a long time after being used - especially if everything is being done in one function (the GC "Generational" should free up memory faster if each image with all of its buffers are read in a separate function; but is not guaranteed).
– mgibsonbr
I noticed, I tried running a "Hello world" with a Thread.Sleep(10000) just to see how much a simple Hello world consumes. It reached 20MiB.
– Sid
@Sid was reading your issue and you’re talking about bad programming. I even put it at the conclusion of my reply but in the middle of it already showed that this was happening, I was only political in not saying this directly :)
– Maniero