Skip to content

Jvm Stuck Threads

Published: 2025-01-26

Symptom

On a Tomcat/JVM application designed to run as multiple instances across several servers behind a load balancer, we noticed an issue after monthly patching. Following server and application restarts, a few days would pass before certain instances began exhibiting unusually high CPU usage at random.

Typically, each application instance on a single server would operate within a CPU usage range of less than 1% to 2%, with occasional spikes during expected application activity—normal behavior for Windows applications and executables. However, after patching, we observed instances becoming 'stuck' at around 25% CPU usage.

Detailed View

The screenshot below highlights the Tomcat process (ID 5732) and one of its threads (ID 5984). The significantly higher CPU time indicates that the thread had been stuck at this usage level for an extended period and, as our testing confirmed, it would remain in this state indefinitely.

More Information

PowerShell Monitoring

To gain a better understanding of why, it was important to determine when. For this, a PowerShell script was written using WMI to monitor when these threads would become 'stuck'.

A more complete code for this can be found here: Thread Monitor

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Get all Tomcast threads that are over 95% processor time ( this includes the parent process id)
"SELECT * "
"FROM Win32_PerfFormattedData_PerfProc_Thread "
"WHERE PercentProcessorTime > 95 AND Name LIKE '%Tomcat%'"

# Get the command line for the process id to identify the instance
"SELECT CommandLine "
"FROM Win32_Process "
"WHERE ProcessId = $processId"

#SAMPLE DATA CAPTURE

Start time: 20240508070002
Start time: 20240508073003
Start time: 20240508080003
Start time: 20240508083003
Start time: 20240508090003
Start time: 20240508093003
Start time: 20240508100003
Start time: 20240508103003
Start time: 20240508110003 Instance: Instance25 t-CPU %: 99 
Start time: 20240508113003 Instance: Instance25 t-CPU %: 100 
Start time: 20240508120003 Instance: Instance25 t-CPU %: 100 
Start time: 20240508123003 Instance: Instance25 t-CPU %: 99 
Start time: 20240508130003 Instance: Instance25 t-CPU %: 100 
Start time: 20240508133003 Instance: Instance25 t-CPU %: 100 
Start time: 20240508140003 Instance: Instance25 t-CPU %: 98 
Start time: 20240508143003 Instance: Instance25 t-CPU %: 99
Start time: 20240508150003 Instance: Instance25 t-CPU %: 98
Start time: 20240508153003 Instance: Instance25 t-CPU %: 101

Thread Information

Digging deeper we needed to understand what the thread within the Tomcat process was doing. For this we used a tool called jstack to get a thread dump of the exe.

With an ID of 5732 the command looked something like this:

1
C:\Program Files\Java\jdk1.8.0_281\bin>jstack -l 5732 >> "c:\Users\<username>\Desktop\threadinfo.txt"

Sample output of jstack:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"main" #1 prio=5 os_prio=0 tid=0x00007f9c28001000 nid=0x1f03 waiting on condition [0x00007f9c28fef000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at com.example.MyClass.run(MyClass.java:42)
        at java.lang.Thread.run(Thread.java:748)

"Thread-1" #2 prio=5 os_prio=0 tid=0x00007f9c28002000 nid=0x1f04 runnable [0x00007f9c28ff0000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:255)

The Native ID

Looking through the output of jstack we saw all the threads, but to determine which was the one running the CPU core to 100%, we needed to convert the nid or Native ID from hex to decimal.

This was done by taking all the nid's and using Excel to convert them all at once.

Eventually, we determined that nid 0x1760 corresponded to thread id 5984. The stuck thread was a C2 compiler thread.

1
2
"C2 CompilerThread1" #7 daemon prio=9 os_prio=2 tid=0x000000001561f800 nid=0x1760 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

Problem

What Is A C2 Compiler?

The C2 compiler, also known as the Java HotSpot Server Compiler, is one of the Just-In-Time (JIT) compilers used in the Java Virtual Machine (JVM). It is designed to compile Java bytecode into highly optimized native machine code to maximize performance during runtime.

The JVM decides when to compile code based on runtime profiling data collected about the behavior of the application. This decision-making process revolves around the concept of hotspots, where the JVM identifies methods or code paths that are executed frequently or are performance-critical.

Something In The Code

There must be something (hotspot code) within the application source that causes the compiler to hang when it tries to optimize it further.

A Solution

How The App Is Started

The Java Virtual Machine (JVM) has a wide range of parameters that allow control of its behavior. From memory management, to garbage collection, debugging, and many other settings to fine-tune performance.

Here is a sample:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
-Dcatalina.base=D:\Instance00
-Dcatalina.home=D:\Tomcat8.0.47
-Djava.endorsed.dirs=D:\Tomcat8.0.47\endorsed
-Djava.io.tmpdir=D:\Instance00\temp
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=D:\Tomcat8.0.47\conf\logging.properties
-Xmx1536m
-Xms256m
-Xmn100m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=D:\Instance00\logs\java_pid.hprof
-XX:ParallelGCThreads=4
-XX:+UseParallelGC
-XX:SurvivorRatio=8
-XX:TargetSurvivorRatio=75
-XX:MaxTenuringThreshold=15

Skip C2 compilation

The ideal solution would be to try and determine what exactly is causing the hangup and exclude it from compilation. It can be time consuming and difficult to trace down to the exact code. However, if the offending item can be determined it can be excluded from optimization by using a startup parameter string similar to the one outlined below.

XX:CompileCommand=exclude,com.graphhopper.coll.GHLongIntBTree::put

Other options include limiting or disabling aggressive optimizations, reducing the number of active JIT compilation threads, limiting method inlining, or turnning off C2 compilation all together.

In this particular instance we chose to turn off C2 compilation. Since there are multiple levels and the thread dump never showed C1 compiler threads getting stuck, we felt that full C1 compilation was enough and easy to test and implement.

The levels of compilation are as follows:

  • 0 Interpreted code
  • 1 Simple C1 compiled code
  • 2 Limited C1 compiled code
  • 3 Full C1 compiled code
  • 4 C2 compiled code

After months of testing and piloting on a few chosen production instances and verifing no ill effects to the end-user, we were confident in our solution, and implemented it on all our instances.

By using the -XX:TieredStopAtLevel=3 parameter we were able to stop JIT compilation at level 3 and continued monitoring showed no stuck threads and the application runtime was more stable.