
Bash
1 前言
笔者生产线有一个程序经常性的停止工作,程序员也找不到具体的原因,偶尔知道Java有提供jstack工具分析程序的堆栈,于是利用该工具写了一个监视脚本,结果协助程序员找到自己代码的问题所在,完美的解决此故障。
2 最佳实践
2.1 创建监视脚本
mkdir -p ~/scripts/
vim ~/scripts/iapi-jstack.sh
2.1 创建监视脚本
mkdir -p ~/scripts/ vim ~/scripts/iapi-jstack.sh
加入如下内容,
#!/bin/bash setTime=`date +"%Y-%m-%d %H:%M:%S"` jstackStatusLog=/var/log/iapi/jstackStatus.log jastackLog="`dirname "$jstackStatusLog"`/jstack/jstack.$setTime.log" javaUser="iapi" javaHome="/usr/java/jdk1.8.0_65" if [ ! -d `dirname "$jastackLog"` ]; then mkdir -p `dirname "$jastackLog"` fi javaPid=`pgrep -u "$javaUser" java` sudo -u "$javaUser" "$javaHome/bin/jstack" -l "$javaPid" > "$jastackLog" echo "$setTime" >> "$jstackStatusLog" IFS=$'\n' for i in `grep "java.lang.Thread.State:" "$jastackLog" | sort -u | awk -F ':' '{print $2}' | sed 's/^[ \t]*//g'`; do echo "$i": `grep "$i" "$jastackLog" | wc -l` >> "$jstackStatusLog" done echo >> "$jstackStatusLog" #find `dirname "$jastackLog"` -type f -ctime -7 -name "jstack.*.log" -exec ls -l {} \; find `dirname "$jastackLog"` -type f -ctime +7 -name "jstack.*.log" -exec rm -f {} \;
以上脚本基于以下核心命令,
sudo -u "$javaUser" "${JAVA_HOME}/bin/jstack" `pgrep -u "$javaUser" java` > "$jastackLog"
以上脚本的含义是,
– 核心命令“jstack”监控“pgrep -u “$javaUser” java”取得的PID然后将结果输出“$jastackLog”所指向的日志文件
– 核心语句的后面echo到“$jstackStatusLog”所指向的日志文件的行用户统计不同状态的线程数量
– 最后一行用于清理过去的日志文件
另外,我们可以使用如下命令筛选出没有包含在脚本中的统计状态选项,然后手动新增到脚本中,
cat /var/log/iapi/jstack/jstack.*.log | grep "java.lang.Thread.State" | grep -v "Thread.State: WAITING (parking)" | grep -v "Thread.State: WAITING (on object monitor)" | grep -v "Thread.State: TIMED_WAITING (parking)" | grep -v "Thread.State: TIMED_WAITING (sleeping)" | grep -v "Thread.State: TIMED_WAITING (on object monitor)" | grep -v "Thread.State: RUNNABLE" | grep -v "Thread.State: BLOCKED (on object monitor)"
2.2 调用监视脚本
sh ~/scripts/iapi-jstack.sh
2.3 查看统计数据
tail -f /var/log/iapi/jstackStatus.log
可见如下输出,
[...] 2020-10-29 02:00:01 WAITING (parking): 241 WAITING (on object monitor): 8 TIMED_WAITING (parking): 24 TIMED_WAITING (sleeping): 8 RUNNABLE: 18 BLOCKED (on object monitor): 0 2020-10-29 02:05:01 WAITING (parking): 241 WAITING (on object monitor): 8 TIMED_WAITING (parking): 25 TIMED_WAITING (sleeping): 7 RUNNABLE: 17 BLOCKED (on object monitor): 0 [...]
基于以上统计日志数据,
– “WAITING (parking)”状态的进程数有“241”个,所以线程可能有异常
– 具体错误需要由程序员分析输出到“/var/log/imapi/jstack/”目录的日志并结合开发程序拍错
查看的范例命令如下,
cat /var/log/imapi/jstack/jstack.2020-10-29\ 02\:00\:01.log
大致的日志范例如下,
[...] "Catalina-utility-1" #12 prio=1 os_prio=0 tid=0x00007f0e8084d800 nid=0x738e waiting on condition [0x00007f0e54133000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) [...]
没有评论