如何助开发Java排错之堆栈监控?

Bash

1 前言

笔者生产线有一个程序经常性的停止工作,程序员也找不到具体的原因,偶尔知道Java有提供jstack工具分析程序的堆栈,于是利用该工具写了一个监视脚本,结果协助程序员找到自己代码的问题所在,完美的解决此故障。

2 最佳实践

2.1 创建监视脚本

mkdir -p ~/scripts/
vim ~/scripts/iapi-jstack.sh

加入如下内容,

#!/bin/bash

setTime=`date +"%Y-%m-%d %H:%M:%S"`
jastackLog="/var/log/iapi/jstack/jstack.$setTime.log"
jstackStatusLog=/var/log/iapi/jstackStatus.log

if [ -d `dirname $jstackStatusLog` ]; then
        mkdir -p `dirname $jstackStatusLog`
fi

sudo -u iapi /usr/java/jdk1.8.0_65/bin/jstack `pgrep -u iapi java` > "$jastackLog"
echo $setTime >> $jstackStatusLog
echo "WAITING (parking)": `grep "Thread.State: WAITING (parking)" "$jastackLog" | wc -l` >> $jstackStatusLog
echo "WAITING (on object monitor)": `grep "Thread.State: WAITING (on object monitor)" "$jastackLog" | wc -l` >> $jstackStatusLog
echo "TIMED_WAITING (parking)": `grep "Thread.State: TIMED_WAITING (parking)" "$jastackLog" | wc -l` >> $jstackStatusLog
echo "TIMED_WAITING (sleeping)": `grep "Thread.State: TIMED_WAITING (sleeping)" "$jastackLog" | wc -l` >> $jstackStatusLog
echo "TIMED_WAITING (on object monitor)": `grep "Thread.State: TIMED_WAITING (on object monitor)" "$jastackLog" | wc -l` >> $jstackStatusLog
echo "RUNNABLE": `grep "Thread.State: RUNNABLE" "$jastackLog" | wc -l` >> $jstackStatusLog
echo "BLOCKED (on object monitor)": `grep "Thread.State: BLOCKED (on object monitor)" "$jastackLog" | wc -l` >> $jstackStatusLog
echo >> $jstackStatusLog

find /var/log/iapi/jstack/ -type f -ctime +7 -name "jstack.*.log" -exec rm -f {} \;

以上脚本基于以下核心命令,

sudo -u iapi /usr/java/jdk1.8.0_65/bin/jstack `pgrep -u iapi java` > "$jastackLog"

以上脚本的含义是,
– 核心命令“jstack”监控“pgrep -u iapi java”取得的PID然后将结果输出“$jastackLog”所指向的日志文件
– 核心语句的后面echo到“$jstackStatusLog”所指向的日志文件的行用户统计不同状态的线程数量
– 最后一行用于清理过去的日志文件
另外,我们可以使用如下命令筛选出没有包含在脚本中的统计状态选项,然后手动新增到脚本中,

cat /var/log/iapi/jstack/jstack.*.log | grep "java.lang.Thread.State" | grep -v "Thread.State: WAITING (parking)" | grep -v "Thread.State: WAITING (on object monitor)" | grep -v "Thread.State: TIMED_WAITING (parking)" | grep -v "Thread.State: TIMED_WAITING (sleeping)" | grep -v "Thread.State: TIMED_WAITING (on object monitor)" | grep -v "Thread.State: RUNNABLE" | grep -v "Thread.State: BLOCKED (on object monitor)"

2.2 调用监视脚本

sh ~/scripts/iapi-jstack.sh

2.3 查看统计数据

 tail -f /var/log/iapi/jstackStatus.log

可见如下输出,

[...]
2020-10-29 02:00:01
WAITING (parking): 241
WAITING (on object monitor): 8
TIMED_WAITING (parking): 24
TIMED_WAITING (sleeping): 8
RUNNABLE: 18
BLOCKED (on object monitor): 0

2020-10-29 02:05:01
WAITING (parking): 241
WAITING (on object monitor): 8
TIMED_WAITING (parking): 25
TIMED_WAITING (sleeping): 7
RUNNABLE: 17
BLOCKED (on object monitor): 0
[...]

基于以上统计日志数据,
– “WAITING (parking)”状态的进程数有“241”个,所以线程可能有异常
– 具体错误需要由程序员分析输出到“/var/log/imapi/jstack/”目录的日志并结合开发程序拍错
查看的范例命令如下,

cat /var/log/imapi/jstack/jstack.2020-10-29\ 02\:00\:01.log

大致的日志范例如下,

[...]
"Catalina-utility-1" #12 prio=1 os_prio=0 tid=0x00007f0e8084d800 nid=0x738e waiting on condition [0x00007f0e54133000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for   (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
[...]
没有评论

发表评论

Bash
如何自动重启Tomcat异常的服务?

1 前言 一个问题,一篇文章,一出故事。 笔者几天前发现Tomcat会因为一个错误而停止服务,虽然进 …

Bash
如何自动挂载目录?

1 前言 一个问题,一篇文章,一出故事。 笔者需要写一个脚本定时挂载目录,但是AutoFS测试过骨兼 …

Bash
如何熟悉shell if?

1 基础知识 1.1 命令使用格式 1.2.1 获取命令帮助 man if 1.2.2 基本判断用法 …