如何用Tigase监控Elasticsearch集群?

Bash

1 前言

一个问题,一篇文章,一出故事。
笔者生产中有一套Elasticsearch集群,笔者为了能第一时间收到集群整体的警告信息,于是使用著名IM即Tigase作为接收警告信息的渠道。
如果你需要了解该集群的搭建,请参阅如下章节,

如何部署带安全认证的Elasticsearch 8.x集群?

2 最佳实践

2.1 部署URL编码环境

如何用Python 3.x实现Linux的URL编码和URL解码?

2.2 创建备份脚本

mkdir ~/scripts/
vim ~/scripts/monitorElasticsearchStack.sh

加入如下配置,

#!/bin/bash

source /etc/profile

tigaseAppID="10086"
tigaseAppPassword="*********"
tigaseEmployeeID="will,jeff"
tigaseApiURL="https://tigase.cmdschool.org:8092/tigase/pushText"
errLog="/var/log/elasticsearchStackError.log"
stackLog="/var/log/elasticsearchStack.log"
esHost="http://elasticsearch.cmdschool.org:9200"
esUser="elastic"
esPasswd="elasticpwd"
maxDiskUsage="75%"
maxMemoryUsage="85%"
maxCpuUsage="85%"

getClusterHealth() {
  clusterStatus=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/health" | jq -r '.status')
  echo "Cluster Status: $clusterStatus"
}

getClusterDiskUsage() {
  diskTotal=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.total_in_bytes')
  diskFree=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.free_in_bytes')
  diskAvailable=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.fs.available_in_bytes')
  diskUsage=$(echo "scale=2; ($diskTotal - $diskAvailable) / $diskTotal * 100" | bc)
  echo "Cluster Disk Usage: $diskUsage%"
}

getClusterMemoryUsage() {
  memTotal=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.jvm.mem.heap_max_in_bytes')
  memUsed=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.jvm.mem.heap_used_in_bytes')
  memUsage=$(echo "scale=2; $memUsed / $memTotal * 100" | bc)
  echo "Cluster Mem Usage: $memUsage%"
}

getClusterCpuUsage() {
  cpuUsage=$(curl -u $esUser:$esPasswd -s "$esHost/_cluster/stats" | jq -r '.nodes.process.cpu.percent')
  echo "Cluster CPU Usage: $cpuUsage%"
}

errorCounter=0
clusterMessage=$(getClusterHealth)
if [ "$clusterMessage" != "" ]; then
        echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
        if [ "$(echo $clusterMessage | grep 'green' | wc -l)" != "1" ]; then
                echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
                errorCounter=$(echo "$errorCounter + 1" | bc -l)
        fi
fi

clusterMessage=$(getClusterDiskUsage)
if [ "$clusterMessage" != "" ]; then
        echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
        currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
        maxUsage=$(printf '%f\n' "`echo $maxDiskUsage | sed -e 's/%//g'`")
        result=$(echo "$currentUsage > $maxUsage" | bc -l)
        if [ "$result" -eq 1 ]; then
                echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
                errorCounter=$(echo "$errorCounter + 1" | bc -l)
        fi
fi

clusterMessage=$(getClusterMemoryUsage)
if [ "$clusterMessage" != "" ]; then
        echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
        currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
        maxUsage=$(printf '%f\n' "`echo $maxMemoryUsage | sed -e 's/%//g'`")
        result=$(echo "$currentUsage > $maxUsage" | bc -l)
        if [ "$result" -eq 1 ]; then
                echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
                errorCounter=$(echo "$errorCounter + 1" | bc -l)
        fi
fi

clusterMessage=$(getClusterCpuUsage)
if [ "$clusterMessage" != "" ]; then
        echo $(date +'%Y-%m-%d %H:%M:%S')" "$clusterMessage | tee -a "$stackLog"
        currentUsage=$(printf '%f\n' "`echo $clusterMessage | awk -F ':' '{print $2}' | sed -e 's/ //g' -e 's/%//g'`")
        maxUsage=$(printf '%f\n' "`echo $maxCpuUsage | sed -e 's/%//g'`")
        result=$(echo "$currentUsage > $maxUsage" | bc -l)
        if [ "$result" -eq 1 ]; then
                echo `date +'%Y-%m-%d %H:%M:%S'`" $clusterMessage" >> "$errLog"
                errorCounter=$(echo "$errorCounter + 1" | bc -l)
        fi
fi

if [ "$errorCounter" -le "0" ]; then
        exit 0
fi

logMessage='ES Cluster Log System:'$'\n'$(tail -n "$errorCounter" "$errLog" | awk -F ' ' '{for (i=3;i<=NF;i++)printf("%s ", $i);print ""}' | sed -e 's/Cluster //g')
encodeMessage=$(urlencode "$logMessage")
tigaseMessageBody="$encodeMessage"
curlData="appId=$tigaseAppID&password=$tigaseAppPassword&employee_id=$tigaseEmployeeID&body=$tigaseMessageBody"
curl -X POST -d "$curlData" "$tigaseApiURL" &> /dev/null
exit 0

– 参数“tigaseAppID”定义消息软件的应用号ID,范例为”10086″
– 参数“tigaseAppPassword”定义消息软件的应用号密码,范例为”*********”
– 参数“tigaseEmployeeID”定义消息软件中的员工工号,范例为”will”和“jeff”,使用逗号分隔
– 参数“tigaseApiURL”定义消息软件的接口地址,范例为”https://tigase.cmdschool.org:8092/api/pushText”
– 参数“errLog”定义集群错误日志的路径,范例为”/var/log/elasticsearchStackError.log”
– 参数“stackLog”定义集群资源使用情况日志的路径,范例为”/var/log/elasticsearchStack.log”
– 参数“esUser”定义Elasticsearch的管理员用户,范例为”elastic”
– 参数“esPasswd”定义Elasticsearch的管理员用户密码,范例为”elasticpwd”
– 参数“maxDiskUsage”定义Elasticsearch集群整体硬盘空间的警告最大值,范例为”75%”
– 参数“maxMemoryUsage”定义Elasticsearch集群整体内存空间的警告最大值,范例为”85%”
– 参数“maxCpuUsage”定义Elasticsearch集群整体CPU的警告最大值,范例为”85%”

2.3 创建脚本触发

crontab -e

加入如下配置,

*/1 * * * * sh ~/scripts/monitorElasticsearchStack.sh

基于以上脚本运行后,如果符合条件,消息软件将收到如下警告消息,

ES Cluster Log System:
Status: yellow 
Disk Usage: 78.00% 
Mem Usage: 86.00% 
CPU Usage: 86% 
没有评论

发表回复

Elastic Stack
如何配置logstash的持久队列?

1 前言 一个问题,一篇文章,一出故事。 昨天15:37:37~15:46:28运行于Microso …

Bash
如何用Tigase监控postfix smtp服务?

1 前言 一个问题,一篇文章,一出故事。 笔者生产中的smtp服务器最近因为负载均衡器的路由故障而导 …

Elastic Stack
如何重启Elasticsearch集群的节点?

1 前言 一个问题,一篇文章,一出故事。 由于笔者需要对Elasticsearch的机器进行硬件升级 …