如何用Tigase监控Elasticsearch集群?
- By : Will
- Category : Bash, Elastic Stack, XMPP
1 前言
一个问题,一篇文章,一出故事。
笔者生产中有一套Elasticsearch集群,笔者为了能第一时间收到集群整体的警告信息,于是使用著名IM即Tigase作为接收警告信息的渠道。
如果你需要了解该集群的搭建,请参阅如下章节,
2 最佳实践
2.1 部署URL编码环境
2.2 创建备份脚本
mkdir ~/scripts/ vim ~/scripts/monitorElasticsearchStack.sh
加入如下配置,
#!/bin/bash
source /etc/profile
tigaseAppID="10086"
tigaseAppPassword="*********"
tigaseEmployeeID="will,jeff"
tigaseApiURL="https://tigase.cmdschool.org:8092/tigase/pushText"
errLog="/var/log/elasticsearchStackError.log"
stackLog="/var/log/elasticsearchStack.log"
esHost="http://elasticsearch.cmdschool.org:9200"
esUser="elastic"
esPasswd="elasticpwd"
maxDiskUsage="75%"
maxMemoryUsage="85%"
maxCpuUsage="85%"
maxShardsUsage="90%"
getClusterHealth() {
clusterStatus=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/health" | jq -r '.status')
echo "Cluster Status: $clusterStatus"
}
getClusterDiskUsage() {
diskTotal=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.total_in_bytes')
diskFree=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.free_in_bytes')
diskAvailable=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.available_in_bytes')
diskUsage=$(echo "scale=2; ($diskTotal - $diskAvailable) / $diskTotal * 100" | bc)
echo "Cluster Disk Usage: $diskUsage%"
}
getClusterMemoryUsage() {
memTotal=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.jvm.mem.heap_max_in_bytes')
memUsed=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.jvm.mem.heap_used_in_bytes')
memUsage=$(echo "scale=2; $memUsed / $memTotal * 100" | bc)
echo "Cluster Mem Usage: $memUsage%"
}
getClusterCpuUsage() {
cpuUsage=($(curl -u $esUser:$esPasswd -s "$esHost/_nodes/stats/process" | jq '.nodes[].process.cpu.percent'))
totalUsage=0
nodeCount=${#cpuUsage[@]}
for usage in "${cpuUsage[@]}"; do
totalUsage=$((totalUsage + usage))
done
averageUsage=$((totalUsage / nodeCount))
echo "Cluster CPU Usage: $averageUsage%"
}
getClusterShardsUsage() {
nodeMaxShards=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/settings?include_defaults=true" | jq -r '.persistent.cluster.max_shards_per_node')
readarray -t nodeShards <<< $(curl -u $esUser:$esPasswd -s "$esHost/_cat/nodes?h=shards")
totalShards=0
nodeCount=${#nodeShards[@]}
for value in "${nodeShards[@]}"; do
totalShards=$((totalShards + value))
done
maxTotalShards=$((nodeMaxShards * nodeCount))
shardsUsage=$((totalShards * 100 / (nodeMaxShards * nodeCount)))
echo "Cluster Shares Usage: $shardsUsage%"
}
errorCounter=0
clusterMessage=$(getClusterHealth)
if [ "$clusterMessage" != "" ]; then
echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
if [ "$(echo $clusterMessage | grep 'green' | wc -l)" != "1" ]; then
echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
errorCounter=$(echo "$errorCounter + 1" | bc -l)
fi
fi
clusterMessage=$(getClusterDiskUsage)
if [ "$clusterMessage" != "" ]; then
echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
maxUsage=$(printf '%f\n' "`echo $maxDiskUsage | sed -e 's/%//g'`")
result=$(echo "$currentUsage > $maxUsage" | bc -l)
if [ "$result" -eq 1 ]; then
echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
errorCounter=$(echo "$errorCounter + 1" | bc -l)
fi
fi
clusterMessage=$(getClusterMemoryUsage)
if [ "$clusterMessage" != "" ]; then
echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
maxUsage=$(printf '%f\n' "`echo $maxMemoryUsage | sed -e 's/%//g'`")
result=$(echo "$currentUsage > $maxUsage" | bc -l)
if [ "$result" -eq 1 ]; then
echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
errorCounter=$(echo "$errorCounter + 1" | bc -l)
fi
fi
clusterMessage=$(getClusterCpuUsage)
if [ "$clusterMessage" != "" ]; then
echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
maxUsage=$(printf '%f\n' "`echo $maxCpuUsage | sed -e 's/%//g'`")
result=$(echo "$currentUsage > $maxUsage" | bc -l)
if [ "$result" -eq 1 ]; then
echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
errorCounter=$(echo "$errorCounter + 1" | bc -l)
fi
fi
clusterMessage=$(getClusterShardsUsage)
if [ "$clusterMessage" != "" ]; then
echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
maxUsage=$(printf '%f\n' "`echo $maxShardsUsage | sed -e 's/%//g'`")
result=$(echo "$currentUsage > $maxUsage" | bc -l)
if [ "$result" -eq 1 ]; then
echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
errorCounter=$(echo "$errorCounter + 1" | bc -l)
fi
fi
if [ "$errorCounter" -le "0" ]; then
exit 0
fi
logMessage='ES Cluster Log System:'$'\n'$(tail -n "$errorCounter" "$errLog" | awk -F ' ' '{for (i=3;i<=NF;i++)printf("%s ", $i);print ""}' | sed -e 's/Cluster //g')
encodeMessage=$(urlencode "$logMessage")
tigaseMessageBody="$encodeMessage"
curlData="appId=$tigaseAppID&password=$tigaseAppPassword&employee_id=$tigaseEmployeeID&body=$tigaseMessageBody"
curl -X POST -d "$curlData" "$tigaseApiURL" &> /dev/null
exit 0
– 参数“tigaseAppID”定义消息软件的应用号ID,范例为”10086″
– 参数“tigaseAppPassword”定义消息软件的应用号密码,范例为”*********”
– 参数“tigaseEmployeeID”定义消息软件中的员工工号,范例为”will”和“jeff”,使用逗号分隔
– 参数“tigaseApiURL”定义消息软件的接口地址,范例为”https://tigase.cmdschool.org:8092/api/pushText”
– 参数“errLog”定义集群错误日志的路径,范例为”/var/log/elasticsearchStackError.log”
– 参数“stackLog”定义集群资源使用情况日志的路径,范例为”/var/log/elasticsearchStack.log”
– 参数“esUser”定义Elasticsearch的管理员用户,范例为”elastic”
– 参数“esPasswd”定义Elasticsearch的管理员用户密码,范例为”elasticpwd”
– 参数“maxDiskUsage”定义Elasticsearch集群整体硬盘空间的警告最大值,范例为”75%”
– 参数“maxMemoryUsage”定义Elasticsearch集群整体内存空间的警告最大值,范例为”85%”
– 参数“maxCpuUsage”定义Elasticsearch集群整体CPU的警告最大值,范例为”85%”
– 参数“maxShardsUsage”定义Elasticsearch集群整体分片的警告最大值,范例为”90%”
2.3 创建脚本触发
crontab -e
加入如下配置,
*/1 * * * * sh ~/scripts/monitorElasticsearchStack.sh
基于以上脚本运行后,如果符合条件,消息软件将收到如下警告消息,
ES Cluster Log System: Status: yellow Disk Usage: 78.00% Mem Usage: 86.00% CPU Usage: 86%
没有评论