Jan 29

After the two posts about garbage collection (basic and advanced) I started receiving questions regarding the works of the GC, and tips on how to use it properly. This post will be devoted to that subject, with some of the questions asked brought into highlight and answered using a lot of information coming from a great presentation Sun gave on the subject and with some knowledge acquired from the previous posts’ research material.

Small objects

The GC just loves small objects; this is a really important point to remember, as this is a major caveat for most programs. Small objects are easy to allocate, either directly to Eden in most GCs – considered “the fast path” – or in the concurrent GC, by using one of the optimised linked lists it uses for popular allocation sizes.

On the other hand, large objects are really disliked. They could take longer to allocate, since they might be too big for Eden and will go directly to the old generation area; they take longer to initialize (when setting fields to their default values such as null and zero); and they might cause fragmentation when deallocated in a partially-compacting GC such as the parallel compacting collector or a non-compacting GC such as the concurrent collector.

The greatest fear of most developers when allocating objects are the intermediate results. The rule here is to not fear the intermediate result objects: it’s better to allocate more short-lived, immutable objects like most intermediate results are than a lot fewer, mutable objects that stay in the memory for a longer time. These mutable objects will eventually make your code more obscure at best, or fragment your memory and confuse the GC at worst. On the other hand, don’t just allocate a lot of objects “for good measure”. Over-allocating might cause too-frequent GCs which could hurt application performance, so be smart at what you’re allocating.

Non-uniformed memory access

Try to keep your objects constrained to a single thread as much as possible. This is less important for the GC but for overall memory usage performance, due to something called Non-Uniform Memory Access, or NUMA for short. The basic idea of NUMA is to provide increased performance for processors by allowing each processor to work with a specific memory space dedicated for it. When the processor allocates and manipulates objects, it does so only in its dedicated memory and thus avoids conflicts with other processors. Since not all data can remain local, NUMA also allows sharing memory between processors, but this tends to hurt performance.

The Java VM tries to use this feature by allocating objects a thread creates into its dedicated space – remember that in a multi-processor environment, the JVM tries to distribute threads across the available processors. Since the creating thread is usually the one using and manipulating the data it created, it is more efficient for it to continuously do so. However, if the data hops between threads, performance could decrease since the memory will be shared across processor memory banks.

Object pools

Object pools are a very special case. When I think of object pools I tend to think of C programming, and how allocation was expensive in that time, and pools of objects were created to avoid constant allocation and deallocation. In older days, when Java GCs were not as highly optimized as they are today, that might have also been a reason to use object pools in Java – but today, that’s not an issue.

Also, allocating the majority of objects is faster than ever, even faster than most C libraries. So, there is no real reason to keep pools as they create problems in all cases, except for cases where allocating or initializing an object is an expensive task, such as database connections or threads. On the other hand, these kind of pools have been implemented for us, hidden behind the JDBC interfaces and the concurrent framework – so why try to recreate those anyway?

Just to list a few of the problems object pools create: first, an unused object takes up memory space for no reason; the GC must process the unused objects as well, detaining it on useless objects for no reason; and in order to fetch an object from the object pool a synchronization is usually required which is much slower than the asynchronous allocation available natively.

Finalizable objects

Perhaps the most known problem with memory management is the finalizable objects, but I will give a bit of explanation still and also a good example of how to avoid it while still achieve the same goal.

Finalizable objects are objects which override the finalize() method. This method serves as the cleanup method for objects which deal with resources that need to be managed outside of the JVM such as files, database connections and sockets. For example, the FileInputStream class calls the close method if it hasn’t been called before. While this might remind everyone the destructors from C++, this technique is not a destructor. Unlike destructors, the time the object’s finalize method gets called is undefined, as the object waits in queue until the JVM can tend to it and finalize it.

The way it works is this: when a finalizable object is allocated it is marked as such. When the application has no more references to it, the GC enqueue it in the object finalization queue. The JVM has a thread dedicated to removing elements from this queue and calling the finalize method on them; however, to keep the data integrity on the object, the GC does not claim it and traverses its tree as a live object! Only after the object’s finalize method gets called, the object and the references it contains are allowed to be claimed.

You don’t even need to implement the finalize method to be affected by it, either: if you extend a finalizable object, the extending object is marked as finalizable as well. The best example is Frame (and JFrame which extends it), which uses finalize to dispose of the resources allocated by the OS for the window. The best solution for this problem is using delegation instead of inheritance, however for the case of Frame the problem seems to have been solved in Java 6 where the finalize method is not implemented anymore (probably improving the performance of many user interfaces).

There is a really great article about the subject written by Tony Printezis. In his final code example Tony brings a really great solution to the problem of cleaning up after the class without the finalize method – instead, he uses weak references. I’d like to explain that last example, as I think its extremely important. First, a class diagram to understand the classes and interfaces involved.

rq diagram

Notice that while Tony didn’t use it, I made NativeImage and Image implement Closeable, a new interface in Java 5 for IO classes that need to be closed. A typical code for a Closeable class will be:


Image image = null;
try {
  image = new Image(...);
  // do something with image
} catch (...) {
  // deal with exceptions gracefully
} finally {
  image.close();
}

When Image calls the close method, NativeImage disposes of the object gracefully and in a timely manner. On the other hand, if the developer forgets to call the close method, the GC will collect the image and the native resource will never be freed. The way Tony’s code solves this problem (again, using weak references and not the finalize method!) is depicted with the following diagram:

rq works

As you can see, all objects are cleared and the native data is safely freed for later use, taking only a single GC cycle and removing a lot of strain from the application.

Memory leaks

While the GC does a great job at removing unreachable objects, it doesn’t help against memory leaks as they might occur by sloppy code which leaves references to unused objects. The following list contains the common trouble-makers and some solutions:

  • Objects defined in a higher scope than they should might stay alive longer than expected. Always define objects in the lowest scope possible for them.
  • Listeners for observable objects which were not removed after their task was done will stay alive, receive events and generally spend processor and memory resources for no good reason. Always make sure that listeners are removed from their observable when they’re not needed anymore.
  • Exceptions might change the control flow, which might skip the code you wrote which removes those pesky listeners. Always use the finally clause when removing references to listeners or other type of objects from usually persistent collections.
  • Instances of inner classes contain an implicit reference to their outer class. You must be aware of this behavior, and if you don’t use the outer class, define the inner class as static.
  • Sometimes keeping additional information is required for certain types, but the class cannot be extended or it will prove a bad design to do so. For these cases, a Map instance is usually used to map between the object and its extended metadata. The kept objects usually should remove themselves from the map when their use is over, which is often forgotten. Luckily, WeakHashMap keeps the keys as weak references and it should be used for such metadata.
  • And obviously the use of the finalize method which might be extremely slow and delay the claiming of new memory spaces, or even do worse and resurrect the finalized object!

You can get good analysis data using tools like jhat for reachability analysis or jmap for class histograms.

This is a longer post than normal, so I hope you stayed to the end. Please write to me of things you’ve encountered to enrich my knowledge on the subject!

Related Posts with Thumbnails
Share

13 Responses to “GC Tips and Memory Leaks”

  1. Web 2.0 Announcer Says:

    GC Tips and Memory Leaks

    [...]After the two posts about garbage collection (basic and advanced) I started receiving questions regarding the works of the GC, and tips on how to use it properly. This post will be devoted to that subject, with some of the questions asked brought …

  2. David Says:

    This is a great piece of information. This caught my I because I am currently working a gig attempting to get some metrics on a suite of web apps and services running under Tomcat and JBoss. Everything is pretty much legacy FOSS. I am trying to use JMeter for the data acquisition and analysis because the JConsole, Jmap and JHat are not available on the hosting servers because all the JDKs are: 1.3.x and 1.4.x. Do have any better suggestions?

  3. Avah Says:

    David: I don’t know many profiling solutions.. I know that JProfiler did a good job for us a while ago, and it works with earlier JVMs, so it might suite you as well…

  4. Kris G. Says:

    Right, so which objects are large and which are considered small? ;)

  5. Avah Says:

    Kris: I couldn’t find that out, but I can refer you to a great document discussing all the knobs you can use within the GC. The document is here, and generally the subject is called either “GC tuning” or “GC ergonomics”.

  6. Garbage Collection - The comic panel Says:

    [...] Garbage Collection set of posts (Generations, Parallel and Concurrent, Tips and Memory Leaks) are ones that I am personally very proud of. First, they were very interesting to write, as the [...]

  7. Garbage collectors « Java Village Says:

    [...] une petite intro, on accélère le rythme avec un article un peu plus ardu, pour finir sur un best practice. Pour le fun, vous pouvez terminer sur celui [...]

  8. Java Logger Memory Leaks Says:

    [...] influence Since it’s a “heavy” back-end system, the GC could affect its performance if code was written in an non-optimal way. I decided to take a look at how hard it was working, [...]

  9. Simple solution to resource collection Says:

    [...] a task the GC doesn’t do. That’s because the GC does something else for us called object finalization, where resources usually clean themselves [...]

  10. KR Says:

    How does it affect the performance if I unnecessarily override finalize method in all my classes? Does all the objects are put into a queue which are marked as finalizable and it takes lot of memory by allocating space in the queue and hence the affecting the performance? please correct me if my understanding is Wrong.

  11. Tweaking the producer-consumer model Says:

    [...] avoid using the finalize method and how to do so efficiently. If you haven’t, I recommend you do so now. It could make the rest of the post much easier to understand. So, knowing we want to use the GC to [...]

  12. Avah Says:

    Hello!! i need to learn java, where can i start??

  13. Javin @ eclipse remote debugging Says:

    I am not sure about heap fragmentation though , there is opinion that frequent java garbage collection causes heap fragmentation but java is intelligent enough to compact heap and I believe this behavior is available in latest JRE..

    Javin
    How Garbage collection works in Java