Bash
1 前言
笔者生产线有一个程序经常性的停止工作,程序员也找不到具体的原因,偶尔知道Java有提供jstack工具分析程序的堆栈,于是利用该工具写了一个监视脚本,结果协助程序员找到自己代码的问题所在,完美的解决此故障。
2 最佳实践
2.1 创建监视脚本
mkdir -p ~/scripts/
vim ~/scripts/iapi-jstack.sh
2.1 创建监视脚本
mkdir -p ~/scripts/ vim ~/scripts/iapi-jstack.sh
加入如下内容,
#!/bin/bash
setTime=`date +"%Y-%m-%d %H:%M:%S"`
jstackStatusLog=/var/log/iapi/jstackStatus.log
jastackLog="`dirname "$jstackStatusLog"`/jstack/jstack.$setTime.log"
javaUser="iapi"
javaHome="/usr/java/jdk1.8.0_65"
if [ ! -d `dirname "$jastackLog"` ]; then
mkdir -p `dirname "$jastackLog"`
fi
javaPid=`pgrep -u "$javaUser" java`
sudo -u "$javaUser" "$javaHome/bin/jstack" -l "$javaPid" > "$jastackLog"
echo "$setTime" >> "$jstackStatusLog"
IFS=$'\n'
for i in `grep "java.lang.Thread.State:" "$jastackLog" | sort -u | awk -F ':' '{print $2}' | sed 's/^[ \t]*//g'`; do
echo "$i": `grep "$i" "$jastackLog" | wc -l` >> "$jstackStatusLog"
done
echo >> "$jstackStatusLog"
#find `dirname "$jastackLog"` -type f -ctime -7 -name "jstack.*.log" -exec ls -l {} \;
find `dirname "$jastackLog"` -type f -ctime +7 -name "jstack.*.log" -exec rm -f {} \;
以上脚本基于以下核心命令,
sudo -u "$javaUser" "${JAVA_HOME}/bin/jstack" `pgrep -u "$javaUser" java` > "$jastackLog"
以上脚本的含义是,
– 核心命令“jstack”监控“pgrep -u “$javaUser” java”取得的PID然后将结果输出“$jastackLog”所指向的日志文件
– 核心语句的后面echo到“$jstackStatusLog”所指向的日志文件的行用户统计不同状态的线程数量
– 最后一行用于清理过去的日志文件
另外,我们可以使用如下命令筛选出没有包含在脚本中的统计状态选项,然后手动新增到脚本中,
cat /var/log/iapi/jstack/jstack.*.log | grep "java.lang.Thread.State" | grep -v "Thread.State: WAITING (parking)" | grep -v "Thread.State: WAITING (on object monitor)" | grep -v "Thread.State: TIMED_WAITING (parking)" | grep -v "Thread.State: TIMED_WAITING (sleeping)" | grep -v "Thread.State: TIMED_WAITING (on object monitor)" | grep -v "Thread.State: RUNNABLE" | grep -v "Thread.State: BLOCKED (on object monitor)"
2.2 调用监视脚本
sh ~/scripts/iapi-jstack.sh
2.3 查看统计数据
tail -f /var/log/iapi/jstackStatus.log
可见如下输出,
[...] 2020-10-29 02:00:01 WAITING (parking): 241 WAITING (on object monitor): 8 TIMED_WAITING (parking): 24 TIMED_WAITING (sleeping): 8 RUNNABLE: 18 BLOCKED (on object monitor): 0 2020-10-29 02:05:01 WAITING (parking): 241 WAITING (on object monitor): 8 TIMED_WAITING (parking): 25 TIMED_WAITING (sleeping): 7 RUNNABLE: 17 BLOCKED (on object monitor): 0 [...]
基于以上统计日志数据,
– “WAITING (parking)”状态的进程数有“241”个,所以线程可能有异常
– 具体错误需要由程序员分析输出到“/var/log/imapi/jstack/”目录的日志并结合开发程序拍错
查看的范例命令如下,
cat /var/log/imapi/jstack/jstack.2020-10-29\ 02\:00\:01.log
大致的日志范例如下,
[...]
"Catalina-utility-1" #12 prio=1 os_prio=0 tid=0x00007f0e8084d800 nid=0x738e waiting on condition [0x00007f0e54133000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
[...]
没有评论