Understand JVM and JIT Compiler — Part 4

Hello people!

In the previous part of this series, we learned how JVM works behind the hoods, what is JIT Compilation and what is and how to track Code Cache and now we will see how to tune the Code Cache!

Increasing the Code Cache Size

When we have a code compiled to tier 4, eventually this code will be added at Code Cache. The problem is because this Code Cache has a limit size, so, in case that we have a huge amount of methods compiled to tier 4, some code eventually will be removed from the Code Cache to make space for the next code, and in the future, this removed code could be added again removing another code block.

In other words, in large applications with a lot of methods (code blocks) that could be compiled to level four over time, some methods might be moved into the Code Cache, then moved out, then moved back again, and so on.

But Julio, how could I know if my code cache is full?

Actually it is really simple. As always, the JVM will provide us a message in the console of our application telling us that the CodeCache is full. The message is:

It means that we have potentially methods that could be added into CodeCache but all methods inside code cache are active (being used), so, there is no code to release.

Important: This warning won’t stop our application!

And how can we solve it? I think you already understood what we will do to increase the performance of our application, and you are right!

When we are in this scenario, if we change the default size of the code cache, we can have a significant improvement in the application performance.

Nice Julio, but how can I see info about my current code cache?

Another clever question my friend! And again the answer is simple!

We can easily see information about our Code Cache enabling a JVM option called: -XX:+PrintCodeCache.

If you are running with Java9+, the output will be something similar to this:

But if you are running with Java8-, the output will be similar to this: (I will explain this difference later in this same article.)

Now there is one point: We have a limitation around the Code Cache size and it is based on the Java version that we are using.

If you are using Java 7 or below with a 64 bit JVM then the max size will be 48 megabytes, but if you are using Java 7 or below with a 32 bit JVM then the max size will be 32 megabytes.

If you are using Java 8 or above then the max size will be 240 megabytes but if you disable tiered compilation with the option -XX:-TieredCompilation, then the default size is 48 megabytes.

Now that we already know the max size, we can start tuning the code cache changing the values.

To do that, we have 3 different flags and they are:

  • -XX:InitialCodeCacheSize
  • -XX:ReservedCodeCacheSize
  • -XX:CodeCacheExpansionSize

The -XX:InitialCodeCacheSize is the size of the Code Cache when the application starts. By default, this value is really low, around 160 kilobytes.

The -XX:ReservedCodeCacheSize is the maximum size of the Code Cache.

And the -XX:CodeCacheExpansionSize flag is responsible for dictating how quickly the code cache should grow. It means how much extra space should be added each time the code cache grows.

We can use bytes, kilobytes, megabytes, or gigabytes.

If we want to use bytes, we can just put the number as the example below:

Now, if we want to use kilobytes we need to add “k” or “K” after the number, if we want in megabytes we need to add “m” or “M” after the number, and if we want in gigabytes we need to add “g” or “G” after the number.

But Julio, I really like the JITWatch and already know how to tune the Code Cahe, but I would like to track a remote application, how can I do that?

It is more simple than looks! We also have a great and amazing application called JConsole and it is already part of JDK.

Below, I’ll show how to use it to track the CodeCache and other JVM things like full memory pictures, threads, and so on of a remote application.

To start the JConsole, go to your JDK folder, then enter in your bin folder and then you will see an executable file called “jconsole”.

If your JAVA_HOME is configured for JDK, you can just type:

So, doing that you will see the JConsole window like the image below.

As you can see, you can select either a local process or a remote process.

So, once connected, we need to go to a Memory tab.

If you are using Java 8, you will see an option called Memory Pool “Code Cache”, so you can simply click there and you will see a chart based on the application’s Code Cache activities.

Now, if you are using Java 9+ you will see 2 different options of CodeCache segments called CodeHeap (Memory Pool “CodeHeap ‘non-nmethods’” and Memory Pool “CodeHeap ‘non-profiled nmethods’”)

Do you remember the different results of the “-XX:+PrintCodeCache” flag? It is time to explain why!

Basically, in Java 9 we had the JEP (JDK Enhancement Proposal) 197 that was responsible to split the Code Cache into 3 different segments (also called Code Heaps), “non-method”, “profiled” and “non-profiled”.

The “non-method” code heap contains non-method code such as compiler buffers and bytecode interpreters. This code type stays in the code cache forever. The code heap has a fixed size of 3 MB and the remaining code cache is distributed evenly among the profiled and non-profiled code heaps.

The “profiled” code heap contains lightly optimized, profiled methods with a short lifetime.

And the “non-profiled” code heap contains fully optimized, non-profiled methods with a potentially long lifetime.

Ok Julio, but why did they do that?

According to the official page of the JEP 197, the goals are:

  • Separate non-method, profiled, and non-profiled code
  • Shorter sweep times due to specialized iterators that skip non-method code
  • Improve execution time for some compilation-intensive benchmarks
  • Better control of JVM memory footprint
  • Decrease fragmentation of highly-optimized code
  • Improve code locality because code of the same type is likely to be accessed close in time
  • Better iTLB and iCache behavior
  • Establish a base for future extensions
  • Improved management of heterogeneous code; for example, Sumatra (GPU code) and AOT compiled code
  • Possibility of fine-grained locking per code heap
  • Future separation of code and metadata (see JDK-7072317)

If you want to read more about this, I encourage you to check the official JEP 197 page, clicking here.

So, returning to the JConsole for a JDK9+, you will see 2 different charts for Code Cache like the image below:

Now you can click there and analyze your specific CodeHeap and also check other JVM activities.

Amazing Julio, I could check the activities but I would like to know how I can tune more my application.

Before I answer this question I would like to talk a bit about the difference between 32-bit JVM and 64-bit JVM.

The first thing that you need to know is how many bits have your OS (Operating System).

If you have a 32-bit OS, you must choose a 32-bit JVM, but if you have a 64-bit OS, you can choose either 32-bit JVM or 64-bit JVM.

Ok Julio, but why would I choose the 32-bit JVM if I have a 64-bit OS?

To answer this first I’ll explain the differences between a 32-bit JVM and a 64-bit JVM.

The first and most important difference between those JVMs is the maximum heap size supported.

The maximum total process size (that includes the heap, permgen, and the native code and native memory the JVM uses) for the 32-bit JVM is 4G, and for the 64-bit JVM depends on the OS that this JVM is running.

So, if you have an application that needs less than 3G of heap memory to run, the 32-bit will be faster, and the reason is that each pointer to an object in memory is smaller, so handling these pointers will be quicker.

Now let’s imagine that our application is using a lot of large numeric types like longs and doubles, the 32-bit JVM will be slower based on the same reason mentioned above.

Another interest difference is the JIT compilation process.

We have two different types of applications, the ephemeral applications (also called Client applications) and the webserver applications (also called Server applications). Basically the difference is regarding the lifetime of the application. The client applications have a short lifetime and the server applications have a long lifetime.

For ephemeral applications, start-up time is really important because these applications won’t run very long, so the JIT compiler won’t have time to make the tiered compilation be worth it (because probably no method will run many times to be eligible to tier 4). That’s why in the 32-bit JVM we only have a client compiler (C1).

For web server applications, the JIT compilation is more important than the start-up time, because the JIT compiler will have time to profile the bytecode, put it into the tier 4 and then add this code to a Code Cache.

Now let’s imagine that for some reason, you need to use a 64-bit JVM (there is no 32-bit JVM available for your OS or any other situation), but you want to have the benefit of the client application since your application has a short lifetime.

Theoretically, there is a way to say to the JVM which JIT Compiler we want to use, but in practice, it doesn’t work exactly like that.

We have 3 flags: “-server”, “-client”, “-d64”.

The “-client” flag says that the JIT Compiler should only use the 32-bit client compiler.

The “-server” flag says that the JIT Compiler should only use the 32-bit server compiler.

The “-d64” flag says that the JIT Compiler should use the 64-bit server compiler.

According to Scott Oaks, the argument specifying which compiler to use is not rigorously followed. If you have a 64-bit JVM and specify “-client”, the application will use the 64-bit server compiler anyway. If you have a 32-bit JVM and you specify “-d64”, you will get an error that the given instance does not support a 64-bit JVM.

But even in this situation, the use of the -client flag can be beneficial since the startup time can be faster because less code analysis occurs in advance.

But Julio, I read in your other articles that HostSpots JVM are using the C1 first and then the C2, I mean, using both?

Yes, and it is right, and I’ll talk a little bit about Java History here.

The tiered compilation (the ability to use C1 compiler for the 3 levels of compilation and the use C2 for the level 4) was added only in Java 7 (but you need to explicitly enable it) and in Java 8 it becomes the default, it means that in the earlier Java versions (7-), C1 and C2 were mutually exclusive.

And that’s why, sometimes, we can have some performance improvements adding “-server” or “-client” flag, disabling the tiered compilation (what I don’t recommend for the most of the cases) using the flag “-XX:-TieredCompilation”, or even specifying which level of the tiered compilation the JIT Compiler should stop (avoiding to use C2, for example). We can do that using the flag “-XX:TieredStopAtLevel=<LEVEL>”.

Also, there are 2 things that we can do to get better performance.

The first one is checking how many threads are available for the compiling process.

To change this number we can use the flag “-XX:CICompilerCount=<NUMBER>”.

The second one is checking what is the threshold for native compilation, in other words, how many times a method should run to be compiled into native code (tier 4).

To change this we can use the flag “-XX:CompileThreshold=<NUMBER>”.

But here we have a particularity. The default value of this flag is 10.000, and it is not simply the number of executions of a specific method, actually, it is the number of interpreted method invocations before (re)compiling.

According to official Oracle documentation, CompileThreshold relates to the number of method invocations needed before the method is compiled. OnStackReplacePercentage relates to the number of backward branches taken in a method before it gets compiled, and is specified as a percentage of CompileThreshold. When a method’s combined number of backward branches and invocations reaches or exceeds CompileThreshold * OnStackReplacePercentage / 100, the method is compiled. Note that there is also an option called BackEdgeThreshold, but it currently does nothing. Use OnStackReplacePercentage instead. Larger values for these options decrease compilations. Setting the options larger than their defaults defers when a method gets compiled (or recompiled), possibly even preventing a method from ever getting compiled. Usually, setting these options to larger values also reduces performance (methods are interpreted), so it is important to monitor both performance and code cache usage when you adjust them.

For the client JVM, tripling the default values of these options is a good starting point. For the server JVM, CompileThreshold is already set fairly high, so probably does not need to be adjusted further.

Now we already know what we should do to improve the performance of our applications, but before changing your entire JVM configuration on production, I strongly suggest that you spend a huge effort testing each configuration. To facilitate this test, you can use the Java Flight Recorder.

So, this series ending here, if you are reading this message I would like to thank you and say that your feedback is really important to me!

I hope you enjoyed this series and that your reading was as pleasant as it was for me to write these articles.

And if you really want to learn more about Java Performance, I strongly recommend you to buy one of the most incredible books about it, written by the awesome Scott Oaks.

Thank you again and see you in the next articles!















Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store