Why does Java consume so much memory?

Asked

Viewed 7,393 times

29

I wrote this little program in Java to download images of a thread in a imageboard:

public class FourChanThreadImageDownloader {
    private static void usage() {
        System.out.println("java FourChanThreadImageDownloader <url> <folder>");
        System.exit(1);
    }

    public static void main(String[] args) {
        if(args.length < 2)
            usage();

        final String url = args[0];
        final String targetDirName = args[1];
        final Pattern imageUrlSyntax =
                Pattern.compile("(//i\\.4cdn\\.org/\\w+/\\d+\\.(?:jpg|webm|gif|png))");
        boolean successfull = false;

        final File targetDir = new File(targetDirName);
        if(!targetDir.exists()) {
            System.out.println("Creating destination directory");
            if(!targetDir.mkdir()) {
                System.out.println("Could not create target directory");
                System.exit(1);
            }
        } else if(!targetDir.canWrite()) {
            System.out.println("Cannot put downloaded images inside destination folder:" +
                    " you have not permission to write in this directory.");
            System.exit(1);
        }
        try {
            final URL fourChan = new URL(url);
            final Reader inputReader = new    InputStreamReader(fourChan.openStream());

            System.out.println("Connecting OK, trying to download images");

            final BufferedReader bufReader = new BufferedReader(inputReader);
            final StringBuilder pageContent = new StringBuilder();

            int c;
            while((c = bufReader.read()) != -1)
                pageContent.append((char) c);

            final Matcher finder = imageUrlSyntax.matcher(pageContent.toString());
            while(finder.find()) {
                final String currImage = "http:" + finder.group();
                final String imageName = currImage.split("/")[4];
                final BufferedInputStream targetStream =
                    new BufferedInputStream(new URL(currImage).openStream());
                final ByteArrayOutputStream recipient = new ByteArrayOutputStream();
                System.out.println("Downloading: " + currImage);

                int d;
                int notifyCounter = 0;
                while((d = targetStream.read()) != -1) {
                    recipient.write(d);
                    if(notifyCounter == 8196) {
                        System.out.print('.');
                        notifyCounter = 0;
                    }
                    notifyCounter++;
                }
                System.out.println("\nImage successfully downloaded");
                final File nextImage = new File(targetDir, imageName);
                final BufferedOutputStream image =
                    new BufferedOutputStream(new FileOutputStream(nextImage));
                System.out.println("Saving image in " + nextImage.toString());
                image.write(recipient.toByteArray());
                image.close();
                recipient.close();
                successfull = true;
            }
        } catch(MalformedURLException e) {
           System.out.println("Mistyped URL");
            successfull = false;
        } catch(IOException e) {
            if(e instanceof FileNotFoundException) {
                System.err.println("Cannot download images: Thread not found (404)");
                return;
            }
            System.err.println("An error occurred: " + e.getMessage());
            successfull = false;
        }

        if(successfull)
            System.out.println("Success");
        }
   }

So far so good, it works perfectly. But the problem is: My task manager says that the Java process that runs this program consumes around 80MiB of memory (oscillates between that and even 110MiB).

  • Why Java consumes so much memory?

  • How to avoid super-memory consumption in Java?

  • Best practices for more efficient use of memory in Java?

  • What is in my code that causes such a high memory consumption?

This consumption for me doesn’t seem normal, since I noticed that other programs in Java "big ones" as Apache Tomcat, I2P and Elasticsearch, when running without anyone using them maintains a constant consumption of 100MiB.

Edit:

Recently I made small changes (in fact, optimizations, let’s say) that miraculously improve the memory usage of my little program. Now it consumes around 30MiB and 60MiB. And at first, the changes are simple:

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FourChanThreadImageDownloader {

    private static void usage() {
        System.out.println("java FourChanThreadImageDownloader <url> <folder>");
        System.exit(1);
    }

    public static void main(String[] args) {
        if(args.length < 2)
            usage();

        final String url = args[0];
        final String targetDirName = args[1];
        final Pattern imageUrlSyntax =
                Pattern.compile("(//i\\.4cdn\\.org/\\w+/\\d+\\.(?:jpg|webm|gif|png))");
        boolean successfull = false;

        final File targetDir = new File(targetDirName);
        if(!targetDir.exists()) {
            System.out.println("Creating destination directory");
            if(!targetDir.mkdir()) {
                System.out.println("Could not create target directory");
                System.exit(1);
            }
        } else if(!targetDir.canWrite()) {
            System.out.println("Cannot put downloaded images inside destination folder:" +
                " you have not permission to write in this directory.");
            System.exit(1);
        }

        final URL fourChan;
        try {
            fourChan = new URL(url);
        } catch(MalformedURLException e) {
            System.err.println("Mistyped URL");
            return;
        }

         try(final Reader inputReader = new InputStreamReader(fourChan.openStream());
            final BufferedReader bufReader = new BufferedReader(inputReader)) {
            System.out.println("Connecting OK, trying to download images");

            final StringBuilder pageContent = new StringBuilder();

            int c;
            while((c = bufReader.read()) != -1)
                pageContent.append((char) c);

            final Matcher finder = imageUrlSyntax.matcher(pageContent.toString());
            while(finder.find()) {
                final String currImage = "http:" + finder.group();
                final String imageName = currImage.split("/")[4];

                try(final BufferedInputStream targetStream =
                    new BufferedInputStream(new URL(currImage).openStream());
                final ByteArrayOutputStream recipient = new ByteArrayOutputStream()) {
                    System.out.println("Downloading: " + currImage);

                    int d;
                    int notifyCounter = 0;
                    while((d = targetStream.read()) != -1) {
                        recipient.write(d);
                        if(notifyCounter == 8196) {
                            System.out.print('.');
                            notifyCounter = 0;
                        }
                        notifyCounter++;
                    }

                    System.out.println("\nImage successfully downloaded");
                    final File nextImage = new File(targetDir, imageName);
                    final BufferedOutputStream image =
                        new BufferedOutputStream(new FileOutputStream(nextImage));
                    System.out.println("Saving image in " + nextImage.toString());
                    recipient.writeTo(image);
                    image.flush();
                    image.close();
                    successfull = true;
                }
            }
        } catch(IOException e) {
            if(e instanceof FileNotFoundException)
                System.out.println("Cannot download images: Thread not found (404)");
            else
                System.out.println("An error occurred: " + e.getLocalizedMessage());
            successfull = false;
        }

        if(successfull)
            System.out.println("Success");
    }
}

It is not very different from the original program, the changes were:

  • Buffers and streams were thrown into Try-with-Resources to ensure closure.
  • The toByteArray() was replaced by a writeTo() to prevent the object from copying the entire image and then writing to the file.
  • The stream that writes the image to the file is properly closed and a flush()to ensure the operation.

Despite the answers here, the reason my little program consumed so much memory was this: Bad programming. I didn’t use the proper methods and I didn’t try to manage the resources I use correctly, which doesn’t occur in the code above. What caused the program to fill the memory with images that have already been downloaded and thousands of unopened buffers, in addition to the use of expensive methods such as toByteArray(), as Maniero pointed out. Basically: Several memory Leaks every iteration of loop.

Well, the lesson of this question is more or less the following: know the standard library of your language, know data structures, algorithms and be aware of what happens "behind the scenes" of your code. In this case, my mistake was to let memory Leaks happen.

Java is not meant to create programs that use memory efficiently (Java is not C, let’s say), but this is no reason to program as if memory were infinite as it is no reason to try miracles. It is important to recognize when a high consumption case is strange and still keep in mind good practices.

  • 4

    The Java Virtual Machine (JVM) itself is already quite big, and also the garbage collector (Garbage Collector - Java GC) only "recycles" memory when system memory is missing. So, things you’ve done and already discarded (these buffers of I/O) can still exist in memory for a long time after being used - especially if everything is being done in one function (the GC "Generational" should free up memory faster if each image with all of its buffers are read in a separate function; but is not guaranteed).

  • I noticed, I tried running a "Hello world" with a Thread.Sleep(10000) just to see how much a simple Hello world consumes. It reached 20MiB.

  • @Sid was reading your issue and you’re talking about bad programming. I even put it at the conclusion of my reply but in the middle of it already showed that this was happening, I was only political in not saying this directly :)

1 answer

33


The task manager is not a reliable tool to check how much memory an application is consuming.

There are some factors for the large Java consumption:

  1. Java is a platform, not just a simple native executable program. Java Runtime Java is great for managing the entire platform. And it’s still common to load code that won’t be effectively used in the application. This is probably the biggest responsible for "small" programs consuming a lot of memory.
  2. Java code has a lot of metadata to help the execution. It increases slightly but.
  3. Beyond the burden of bytecode of Java there is a memory consumption of native code generated by Jitter and it can run several times. It’s just an example of what can load a little more what is already well loaded. And The Jitter process is complex.
  4. Java prefers most data types to be objects by reference and this tends to increase memory consumption in proportion to the number of created objects (it will change somewhat in future versions, but there will still be a preference)
  5. There’s a overhead a little exaggerated in each object allocated in the heap. Extra charge proportional to the number of objects.
  6. Some guys have a overhead even bigger. A string, for example enough to have 40 bytes consumption even if it doesn’t have a single character and this can multiply because of the next item.
  7. Some types are immutable and generate excess copies depending on the way they are used. Not every programmer is aware when copies occur and may get out of hand in certain cases.
  8. Some types are more complex than they should contain information that is rarely useful and in-depth compositions. These types used in exaggeration can make a difference.
  9. GC does not release memory immediately after the object is no longer needed. What’s more, it preallocates a good amount of memory to work with generations, even if it’s not already used. So there’s an artificial memory consumption (not that this is necessarily bad). That’s improved a little, but very little.
  10. Threads has its own extra cost. In a sense a thread has a consumption close to that of a process. There is some savings in relation to the process but in certain situations it is small. This is not unique to Java but it has a slightly higher cost.

It is true that some of these items do not make the memory consumption so great in larger applications, but others do a great "damage". The memory consumption generated by the data created at runtime tends to consume much more than the static data (the code, for example), at least in real applications that work with a good volume of data. Very simple applications of the type hello world pay the minimum Java consumption you found to be around 20MB.

Anyway consumption is a little misleading because of the way the Garbage Collector works. It does not bother to release the memory to the operating system in most of its collections. Collection does not necessarily mean a release of memory.

For these features a very simple application consumes a lot. The trend is for larger applications to consume proportionally less than their total size. But we need to define what a simple application is. An application that loads possibly large images into memory is not a lightweight application even if the code is very simple. The example application of the question seems to consume a lot of memory by legitimate necessity.

Part of the consumption "fault" is the language user programmer and platform that does not understand well its internal functioning and/or does not have enough knowledge about algorithms and mainly about data structure.

It is also a problem the use of frameworks heavy or even too complex architectures. Many people either don’t know the cost of what they’re using/doing or don’t care about it. Not quite the case for a hello world but often it is the explanation for a Java application to be heavier than it should be.

On the other hand some frameworks are even more optimized and have their own data structures to save memory.

The company that makes this website has to use a number of techniques to save memory. It is done in C# which has more or less the same "problems" of Java. It is true that the language has better tools to make the optimizations but it still takes work. An example is the new C# compiler that needed to create so many new data structures to meet its memory consumption and performance needs that has grown larger than the general-purpose data structures of the standard language library.

What to do?

There are specific techniques for each situation and get the consumption to be lower, that is the most effective, but even in this language does not help much.

There’s still one left tunning execution through some parameters that can be passed when calling the Java virtual machine. You need to be an optimization specialist to achieve a good result. An adventurer can achieve the opposite. It is easy to get good results in a controlled test and get worse with real use. So constant monitoring is needed in several cases.

Remember that each new is consuming new memory. Of course other things also consume, in your code you have several examples of allocations without new. But this way especially tends to waste memory if you don’t understand what’s going on. Especially inside a loop with lots of interactions. Remember that even if you put the object created with new in the same variable, a new object is created. What you put into the same variable is the reference to the new created object. Of course the old object will be released if there are no further references to it, but it is not known when.

Master the use of perforation tools can be useful. For this it is necessary to organize well the functions. No waning do everything in a single function as is in your code.

Did you know that if there is an exception in certain locations of your sample code you will still have a memory leak because the open resource will not be closed? This may be helping excessive consumption.

Finally start learning all the details of the platform, the inner workings of the virtual machine, the garbage collector, how all the library data structures work, etc.

Have you noticed how complex it is? And how difficult it would be to say everything that can be done to save memory? Anyway, with these items already gives a show that some things can be avoided to reduce consumption.

Some people will say that if you are to worry about memory it is better to use another language. There comes a point where it’s so complicated to get Java to save memory that using modern C++ or Rust can be a better alternative.

And in fact if your concern is memory consumption, it’s best not to choose Java. It was never about the language/platform, so take advantage of the best Java has to offer, it’s not about memory consumption efficiency.

Completion

If I were to sum up an item because memory consumption is great in Java I would say it is because of the philosophy of language of facilitating the life of the programmer. This has a cost. But in the specific case a part of the problem is the misuse of some resources, as the author himself said in the issue, is bad programming. And this is very common.

For specific questions we can help in a focused way on individual questions.

  • So I could conclude that Java was not made for small applications? Because: A tool like mine consumes too much memory, but in a big one, this memory that is not really of the program, is of the platform practically becomes superfluous, that must be the case of Tomcat and I2P.

  • And could I also conclude that breaking my main() method that does the whole task in a function only in minor functions would help? Because: If all program variables are stuck to the scope of main(), then any variable reassignment would create an object to be collected by the CG. Now, if I isolate a responsibility in a method that will be called several times, it would improve because after the scope of the function, the variables will be destroyed?

  • Yes, applications that need to be extremely small really Java doesn’t do well. But it is clear that part of the "guilt" is the platform" but part is the program that abuses things that consume more memory than it should. Break the program into smaller functions can help. Increases the odds. But GC is not deterministic, so it’s hard to guarantee. And the worst thing someone could do is try to force the collection.

  • In fact, the question is practically closed, but the question of "programming" still intrigues me: How my little tool consumes the same amount of memory of really large programs?

  • 1

    @Perhaps because your program is manipulating images, which by their very nature are quite large. Also, taking a look at your code I see several and several buffers - even in some cases you call expensive methods as toByteArray - that if memory serves me whole of the image. Processing multiple images in sequence, no wonder you spend a lot of memory, especially when compared to a Tomcat that, with all its complexity, handles more text (HTML) than anything else...

  • @Sid this mgibsonbr is important, in your case consumption does not seem to be so exaggerated, you are using things that consume a lot of memory. It is what I said, the data will consume a lot. In the specific case everything I said influences little thing. Certainly if you do in C++, for example, the consumption will be lower but not so lower. In the specific case you would have to use, or even create, structures that use memory more cautiously, slowly and benefit from the way GC works. This can bring a processing cost.

  • (taking advantage of the digging hehe) java loads the entire rt.jar file(lib file) when jre runs something right? This file usually.

  • there is a place where this performance will not be a cost that is in distributed environments, clusters, etc., in that Java facilities overcome the "problem" of performance

  • @Andre may be talking about something that I don’t know, but I would say that doesn’t make sense, or at least has no relation to what is posted here. Incusive the subject is memory and non-performance.

  • when Voce speaks of clustered servers with several gigas of memory the execution of the "badly done" code (where you didn’t bother to close the Resources, manage the objects correctly in the heap, stack etc) will not be perceived in the same way

  • I worked on a project where we had 10 Weblogic clustered servers where each jvm had 48GB of memory, the fact that I was managing the allocation of my objects in memory was not perceived in the same way as it would be on your local machine

  • the point is, in Java Enterprise the performance of poorly written programs goes unnoticed because of the infra parruda

  • I did not mention servers in cluster. But surely if you have enough memory it doesn’t matter so much, but it doesn’t mean it matters at all. It doesn’t matter for the basic consumption out of the normal. Actually might matter, I’ve seen case that there was only one cluster because the technology used was wrong and the code was badly done, if you did something better a server would notice (that is, people spend a fortune for a development mistake, but they don’t even realize it. If you have resource leak can burst 1TB of memory quickly in some cases.

Show 8 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.