Tuesday, September 7, 2010

Monitoring Memory in Java

There are a couple of options for measuring the amount of memory a program uses. The simplest one, which does not require any tools and works even with production systems, is the Verbose GC log.

Verbose GC Log

The Verbose GC log is defined when the JVM process is started. There are a couple of switches that can be used:

1.        -verbose:gc — prints basic information about GC to the standard output

2.        -XX:+PrintGCTimeStamps — prints the times that GC executes

3.        -XX:+PrintGCDetails — prints statistics about different regions of memory in the JVM

4.        -Xloggc:<file> — logs the results of GC in the given file

The following is an example of the output generated for Tomcat running in the default configuration with all of the previous switches enabled:

1.854: [GC 1.854: [DefNew: 570K->62K(576K), 0.0012355 secs] 2623K->2175K(3980K), 0.0012922 secs]
1.871: [GC 1.871: [DefNew: 574K->55K(576K), 0.0009810 secs] 2687K->2229K(3980K), 0.0010752 secs]
1.881: [GC 1.881: [DefNew: 567K->30K(576K), 0.0007417 secs] 2741K->2257K(3980K), 0.0007947 secs]
1.890: [GC 1.890: [DefNew: 542K->64K(576K), 0.0012155 secs] 2769K->2295K(3980K), 0.0012808 secs]

The most important set of numbers is located in the second column after the second -> (e.g., in the top line shown it is 2623K->2175K(3980K). These numbers indicate that as a result of GC, we are using around 2200K of memory at the end of each GC cycle.

This trace is not an indication of a memory leak—it shows a short-term trend with less then a second between samples, and that's why we must observe long-term trends. However, if the Verbose GC log showed that the program was using around 2200K of memory after running for two days, and after running for 10 days it was using 2GB of memory (even after GC had just run), we could then conclude that there's a memory leak.

All the information that needs to be collected in order to determine if a memory leak exists can be found in the results of the Verbose GC logs.

Monitoring the Java Process

The following approach works for any Java process, including standalone clients as well as application servers like JBoss and servlet containers like Tomcat. It is based on starting the Java process with JMX monitoring enabled and attaching with the JMX monitoring tools. We'll use Tomcat in the following example.

To start Tomcat or the Java process with JMX monitoring enabled, use the following options when starting JVM:

·        -Dcom.sun.management.jmxremote — enables JMX monitoring

·        -Dcom.sun.management.jmxremote.port=<port> — controls the port for JMX monitoring

Note that if you're on a production system, you'll most likely want to secure your JVM before running it with these parameters. For that, you can specify these additional options:

·        com.sun.management.jmxremote.ssl

·        com.sun.management.jmxremote.authenticate

Once started, you can use JConsole or VisualVM to attach to the process. Note that later JDK 6 versions include VisualVM.

Heap Dump

A heap dump is a list of objects in the memory of JVM as well as the content of the memory occupied by those objects. It preserves the value of any attributes of the objects, including references to other objects. In other words, a heap dump gives you a complete picture of the memory.

There are multiple tools that allow you to dump heap in a Java process:

·        If you're using JDK 6, you can use tool called jmap on any platform.

·        If you're using JDK 5, the situation is slightly more complex:

·        If you're running UNIX (Linux, Solaris, OS X) with JDK 5 you can use jmap.

·        If you're using JDK 5 update 14 or later, you can use the -XX:+HeapDumpOnCtrlBreak option when starting JVM, then use the CTRL+BREAK key combination on Windows (or CTRL + \ on UNIX) to dump the heap.

·        If you're running Windows and using JDK 5 pre-update 14, you'll soon wish you weren't. Trying to reproduce the problem with a more recent JDK is probably the best bet here.

Some tools like VisualVM and memory profilers allow you to initiate a heap dump from the GUI, but you don't need any fancy tools here—jmap will do just fine. As it provides the most general case, we'll use jmap in the next example.

Before you dump heap, be sure to keep the following issues in mind:

·        Programs in the JVM should be paused for the duration of the heap dump, which might take anywhere from ten seconds to several minutes. Many enterprise applications—and users of those applications—don't take kindly to pauses that long, which may cause various timeouts to expire. So don't try this at home or in production (unless the application is already a goner)!

·        Heap dumps are saved on disk, and the files might be fairly large. A good rule is to make sure that you have at least twice the size of the physical memory free on the disk before you initiate a memory dump.

With those final words of caution out of the way, you should now be ready to run the following command:

        jmap -heap:live,format=b,file=FILENAME PID

Note that the -F option, which will dump non-responsive programs, might be useful on UNIX systems, but is not available on Windows. Note also that JDK 6 includes the option +XX:+HeapDumpOnOutOfMemoryError that will dump heap whenever the OutOfMemoryError alert is encountered. This can be a useful option, but keep in mind that it has the potential to consume significant amounts of disk space.

You now have a heap dump in the file FILENAME and are ready to analyze it.

What's In "Leaked" Memory?

With the heap dump complete, we can now take a look at the memory and find out what's really causing the memory leak.

Suppose that objects are holding references to each other as illustrated by the picture below. For the sake of easy calculation, let's assume that each object is 100 bytes, so that all of them together occupy 600 bytes of memory.

Now, suppose that the program holds reference to object A for a prolonged period of time. As a result, objects B, C, D, E, and F are all ineligible for garbage collection, and we have the following amount of memory leaking:

·        100 bytes for object A

·        500 bytes for objects B, C, D, E and F that are retained due to the retention of object A

So, holding reference to object A causes a memory leak of 600 bytes. The shallow heap of object A is 100 bytes (object A itself), and the retained heap of object A is 600 bytes.

Although objects A through F are all leaked, the real cause of the memory leak is the program holding reference to object A. So how can we fix the root cause of this leak? If we first identify that object F is leaked, we can follow the reference chain back through objects D, C and A to find the cause of the memory leak. However, there are some complications to this "follow the reference chain" process:

·        Reference chains can be really long, so manually following them can be time consuming

·        An object is sometimes retained by more than one object, and there can even be circles involved as shown in the picture below:

If we start following inbound references from object F in this example, we have to choose between following object C or object D. In addition, there's the possibility of getting caught in a circle by repeatedly following the path between objects D, E and B. On this small diagram it's easy to see that the root cause is holding object A, but when you're dealing with a situation that involves hundreds of thousands of objects (as any self-respecting memory leak does) you quickly realize that manually following the reference chain be very complex and time consuming.

This is where some shortcuts can come in handy:

·        If we had a tool that allowed us to play a "what would happen if I remove this reference" type of guessing game, we could run experiments that help locate the cause. For example, we could see that if we removed reference from "Cause of Leak" to A in the diagram above, objects A through F would all be freed. Some tools (like Quest's JProbe) have this capability.

·        If the memory leak is large and we have a tool that allows us to sort objects by retained heap, we'll get an even greater head start because the objects with the largest retained heap are usually the cause of large memory leaks.

Now that we understand what memory leaks are and how they can be corrected, let's find out how to fix them by analyzing heap dumps.

Tools for Dealing with Heap Dumps

Tools often provide a few extra helpful features –

·        Present a better summary of heap statistics.

·        Sort objects by retained heap. In other words, some tools can tell you the memory usage of an object and all other objects that are referenced by it, as well as list the objects referenced by other objects. This makes it much faster to diagnose the cause of a memory leak.

·        Function on machines that have less memory then the size of the heap dump. For example, they'll allow you to analyze a 16GB heap dump from your server on a machine with only 1GB of physical memory.

VisualVM is nice tool that gives you just enough to resolve memory leaks, and it shows heap dumps and relations between objects in graphical form.

Feature-wise, one step above VisualVM is the Eclipse Memory Analyzer Tool (MAT), a free tool that includes a lot of additional options. Although it's still in incubation phase as of publication of this article, MAT is free and we've found it to be extremely useful.

Commercial products like JProfiler, YourKit, and JProbe are also excellent tools for debugging memory leaks. These applications include a few options that go above and beyond VisualVM and MAT, but they're certainly not necessary to successfully debug memory leaks.

No comments:

Post a Comment