如何用Tigase监控Elasticsearch集群?
- By : Will
- Category : Bash, Elastic Stack, XMPP
1 前言
一个问题,一篇文章,一出故事。
笔者生产中有一套Elasticsearch集群,笔者为了能第一时间收到集群整体的警告信息,于是使用著名IM即Tigase作为接收警告信息的渠道。
如果你需要了解该集群的搭建,请参阅如下章节,
2 最佳实践
2.1 部署URL编码环境
2.2 创建备份脚本
mkdir ~/scripts/ vim ~/scripts/monitorElasticsearchStack.sh
加入如下配置,
#!/bin/bash source /etc/profile tigaseAppID="10086" tigaseAppPassword="*********" tigaseEmployeeID="will,jeff" tigaseApiURL="https://tigase.cmdschool.org:8092/tigase/pushText" errLog="/var/log/elasticsearchStackError.log" stackLog="/var/log/elasticsearchStack.log" esHost="http://elasticsearch.cmdschool.org:9200" esUser="elastic" esPasswd="elasticpwd" maxDiskUsage="75%" maxMemoryUsage="85%" maxCpuUsage="85%" getClusterHealth() { clusterStatus=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/health" | jq -r '.status') echo "Cluster Status: $clusterStatus" } getClusterDiskUsage() { diskTotal=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.total_in_bytes') diskFree=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.free_in_bytes') diskAvailable=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.available_in_bytes') diskUsage=$(echo "scale=2; ($diskTotal - $diskAvailable) / $diskTotal * 100" | bc) echo "Cluster Disk Usage: $diskUsage%" } getClusterMemoryUsage() { memTotal=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.jvm.mem.heap_max_in_bytes') memUsed=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.jvm.mem.heap_used_in_bytes') memUsage=$(echo "scale=2; $memUsed / $memTotal * 100" | bc) echo "Cluster Mem Usage: $memUsage%" } getClusterCpuUsage() { cpuUsage=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.process.cpu.percent') echo "Cluster CPU Usage: $cpuUsage%" } errorCounter=0 clusterMessage=$(getClusterHealth) if [ "$clusterMessage" != "" ]; then echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog" if [ "$(echo $clusterMessage | grep 'green' | wc -l)" != "1" ]; then echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog" errorCounter=$(echo "$errorCounter + 1" | bc -l) fi fi clusterMessage=$(getClusterDiskUsage) if [ "$clusterMessage" != "" ]; then echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog" currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`") maxUsage=$(printf '%f\n' "`echo $maxDiskUsage | sed -e 's/%//g'`") result=$(echo "$currentUsage > $maxUsage" | bc -l) if [ "$result" -eq 1 ]; then echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog" errorCounter=$(echo "$errorCounter + 1" | bc -l) fi fi clusterMessage=$(getClusterMemoryUsage) if [ "$clusterMessage" != "" ]; then echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog" currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`") maxUsage=$(printf '%f\n' "`echo $maxMemoryUsage | sed -e 's/%//g'`") result=$(echo "$currentUsage > $maxUsage" | bc -l) if [ "$result" -eq 1 ]; then echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog" errorCounter=$(echo "$errorCounter + 1" | bc -l) fi fi clusterMessage=$(getClusterCpuUsage) if [ "$clusterMessage" != "" ]; then echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog" currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`") maxUsage=$(printf '%f\n' "`echo $maxCpuUsage | sed -e 's/%//g'`") result=$(echo "$currentUsage > $maxUsage" | bc -l) if [ "$result" -eq 1 ]; then echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog" errorCounter=$(echo "$errorCounter + 1" | bc -l) fi fi if [ "$errorCounter" -le "0" ]; then exit 0 fi logMessage='ES Cluster Log System:'$'\n'$(tail -n "$errorCounter" "$errLog" | awk -F ' ' '{for (i=3;i<=NF;i++)printf("%s ", $i);print ""}' | sed -e 's/Cluster //g') encodeMessage=$(urlencode "$logMessage") tigaseMessageBody="$encodeMessage" curlData="appId=$tigaseAppID&password=$tigaseAppPassword&employee_id=$tigaseEmployeeID&body=$tigaseMessageBody" curl -X POST -d "$curlData" "$tigaseApiURL" &> /dev/null exit 0
– 参数“tigaseAppID”定义消息软件的应用号ID,范例为”10086″
– 参数“tigaseAppPassword”定义消息软件的应用号密码,范例为”*********”
– 参数“tigaseEmployeeID”定义消息软件中的员工工号,范例为”will”和“jeff”,使用逗号分隔
– 参数“tigaseApiURL”定义消息软件的接口地址,范例为”https://tigase.cmdschool.org:8092/api/pushText”
– 参数“errLog”定义集群错误日志的路径,范例为”/var/log/elasticsearchStackError.log”
– 参数“stackLog”定义集群资源使用情况日志的路径,范例为”/var/log/elasticsearchStack.log”
– 参数“esUser”定义Elasticsearch的管理员用户,范例为”elastic”
– 参数“esPasswd”定义Elasticsearch的管理员用户密码,范例为”elasticpwd”
– 参数“maxDiskUsage”定义Elasticsearch集群整体硬盘空间的警告最大值,范例为”75%”
– 参数“maxMemoryUsage”定义Elasticsearch集群整体内存空间的警告最大值,范例为”85%”
– 参数“maxCpuUsage”定义Elasticsearch集群整体CPU的警告最大值,范例为”85%”
2.3 创建脚本触发
crontab -e
加入如下配置,
*/1 * * * * sh ~/scripts/monitorElasticsearchStack.sh
基于以上脚本运行后,如果符合条件,消息软件将收到如下警告消息,
ES Cluster Log System: Status: yellow Disk Usage: 78.00% Mem Usage: 86.00% CPU Usage: 86%
没有评论