需求:

pod中的容器重启一次则报警通知

pod非Runing 状态则报警

pod中的容器非true状态则报警

三个需求其实是有点重叠的

pod重启期间pod肯定会有非Running状态,只要有重启报警那么pod非Runing也会报警,pod非Runing容器状态肯定非true也会报警

所有报警设置为:

pod重启一次就报警

pod非Runing  and 容器非true (#3)  and pod非删除 =报警

zabbix server中建一个模板

<?xml version="1.0" encoding="UTF-8"?>
<zabbix_export>
<version>3.2</version>
<date>2017-11-23T07:48:53Z</date>
<groups>
<group>
<name>OpenShift</name>
</group>
</groups>
<templates>
<template>
<template>OC Pods</template>
<name>OC Pods</name>
<description/>
<groups>
<group>
<name>OpenShift</name>
</group>
</groups>
<applications>
<application>
<name>restartCount</name>
</application>
<application>
<name>RunningStatus</name>
</application>
</applications>
<items/>
<discovery_rules>
<discovery_rule>
<name>OC Pods Discover</name>
<type>0</type>
<snmp_community/>
<snmp_oid/>
<key>oc.pod.discover</key>
<delay>300</delay>
<status>1</status>
<allowed_hosts/>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<delay_flex/>
<params/>
<ipmi_sensor/>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<filter>
<evaltype>0</evaltype>
<formula/>
<conditions/>
</filter>
<lifetime>7</lifetime>
<description/>
<item_prototypes>
<item_prototype>
<name>Pod {#POD_NAME} Restarts</name>
<type>0</type>
<snmp_community/>
<multiplier>0</multiplier>
<snmp_oid/>
<key>oc.pod.status[{#POD_NAME},restarts]</key>
<delay>30</delay>
<history>30</history>
<trends>0</trends>
<status>0</status>
<value_type>4</value_type>
<allowed_hosts/>
<units/>
<delta>0</delta>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<formula>1</formula>
<delay_flex/>
<params/>
<ipmi_sensor/>
<data_type>0</data_type>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>restartCount</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<application_prototypes/>
</item_prototype>
<item_prototype>
<name>Pod {#POD_NAME} Running</name>
<type>0</type>
<snmp_community/>
<multiplier>0</multiplier>
<snmp_oid/>
<key>oc.pod.status[{#POD_NAME},running]</key>
<delay>30</delay>
<history>30</history>
<trends>0</trends>
<status>0</status>
<value_type>4</value_type>
<allowed_hosts/>
<units/>
<delta>0</delta>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<formula>1</formula>
<delay_flex/>
<params/>
<ipmi_sensor/>
<data_type>0</data_type>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>RunningStatus</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<application_prototypes/>
</item_prototype>
<item_prototype>
<name>Pod {#POD_NAME} Running True</name>
<type>0</type>
<snmp_community/>
<multiplier>1</multiplier>
<snmp_oid/>
<key>oc.pod.status[{#POD_NAME},running_true]</key>
<delay>30</delay>
<history>30</history>
<trends>365</trends>
<status>0</status>
<value_type>3</value_type>
<allowed_hosts/>
<units/>
<delta>0</delta>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<formula>1</formula>
<delay_flex/>
<params/>
<ipmi_sensor/>
<data_type>0</data_type>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>RunningStatus</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
<application_prototypes/>
</item_prototype>
</item_prototypes>
<trigger_prototypes>
<trigger_prototype>
<expression>{OC Pods:oc.pod.status[{#POD_NAME},running].str(Running_true)}=0
and
{OC Pods:oc.pod.status[{#POD_NAME},running].str(Pod deleted)}=0
and
{OC Pods:oc.pod.status[{#POD_NAME},running_true].last(#5)}=0</expression>
<recovery_mode>0</recovery_mode>
<recovery_expression/>
<name>Pod {#POD_NAME} No Running</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>1</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger_prototype>
<trigger_prototype>
<expression>{OC Pods:oc.pod.status[{#POD_NAME},restarts].str(Warning)}=1</expression>
<recovery_mode>1</recovery_mode>
<recovery_expression>{OC Pods:oc.pod.status[{#POD_NAME},restarts].str(Warning,#3)}=0</recovery_expression>
<name>Pod {#POD_NAME} restarted Warning</name>
<correlation_mode>0</correlation_mode>
<correlation_tag/>
<url/>
<status>0</status>
<priority>1</priority>
<description/>
<type>0</type>
<manual_close>1</manual_close>
<dependencies/>
<tags/>
</trigger_prototype>
</trigger_prototypes>
<graph_prototypes/>
<host_prototypes/>
</discovery_rule>
</discovery_rules>
<httptests/>
<macros/>
<templates/>
<screens/>
</template>
</templates>
</zabbix_export>

模板文件

新建一个自动发现规则,有三个监控项对于上面说的三个需求

zabbix agent

在配置文件末尾中加入

# vim zabbix_agentd.conf

UserParameter=oc.pod.discover,/data/app/zabbix/etc/oc_pod_discover.sh
UserParameter=oc.pod.status[*],/data/app/zabbix/etc/oc_pod_monitor.sh $1 $2

自动发现脚本

# vim oc_pod_discover.sh

#!/bin/bash
TOKEN="123456"
ENDPOINT="www.oc.domain.cn:8443"
WORKSPACE="/data/tmp/oc_monitor"
mkdir -p $WORKSPACE #获取所有pod只保留pod name
curl -k \
-H "Authorization: Bearer $TOKEN" \
-H 'Accept: application/json' \
https://$ENDPOINT/api/v1/pods 2>/dev/null > $WORKSPACE/all_pods.json Pod_Name=(`cat $WORKSPACE/all_pods.json |jq -r '.items | .[] | .metadata | .name' |grep -v build |grep -v deploy`) #转换为json格式
printf "{\n"
printf '\t"data":[\n'
for ((i=;i<${#Pod_Name[@]};i++))
do
printf '\t\t{\n'
num=$(echo $((${#Pod_Name[@]}-)))
if [ "$i" == ${num} ];
then
printf "\t\t\t\"{#POD_NAME}\":\"${Pod_Name[$i]}\"}\n"
else
printf "\t\t\t\"{#POD_NAME}\":\"${Pod_Name[$i]}\"},\n"
fi
done
printf "\t]\n"
printf "}\n"

监控脚本

# vim oc_pod_monitor.sh

#!/bin/bash
TOKEN=""
ENDPOINT="www.oc.domain.cn:8443"
POD_NAME="$1"
Monitoring_type="$2"
WORKSPACE="/data/tmp/oc_monitor"
mkdir -p $WORKSPACE #通过pod name获得pod所在的namespace5分钟更新一次
NAMESPACE="`cat $WORKSPACE/all_pods.json |jq -r '.items |.[] |.metadata |.name,.namespace' |grep -A1 $POD_NAME |grep -v $POD_NAME`" #验证pod是否存在
if [ ! -n "$NAMESPACE" ]; then
if [ "$Monitoring_type" = "running_true" ]; then
echo ""
exit
fi
echo "Pod deleted"
exit
fi #获取pod状态数据
if [ ! -f "$WORKSPACE/${POD_NAME}.status" ]; then
if [ "$Monitoring_type" = "running_true" ]; then
echo ""
exit
fi
echo "New Pod"
exit
fi
Pod_Status="`cat $WORKSPACE/${POD_NAME}.status`" #验证容器是否在Pending状态
Pending="`echo "$Pod_Status" |jq -r '.status |.phase'`"
if [ "$Pending" = "Pending" ]; then
if [ "$Monitoring_type" = "running_true" ]; then
echo ""
exit
fi
echo "Pending"
exit
fi #选择要获取的数据
case $Monitoring_type in
restarts)#监控pod是否重启过
#获取pod状态数据写到文件里面可供所有项目调用
curl -k \
-H "Authorization: Bearer $TOKEN" \
-H 'Accept: application/json' \
https://${ENDPOINT}/api/v1/namespaces/$NAMESPACE/pods/$POD_NAME/status 2>/dev/null > $WORKSPACE/${POD_NAME}.status
find /data/tmp/oc_monitor/ -type f -mtime + -name "*" -exec rm -f {} \;
#获取pod的状态只保留restartCount的值 ##获取上次的值
A_line=`sed -n 1p $WORKSPACE/${POD_NAME}.restartCount`
B_line_null="`sed -n 2p $WORKSPACE/${POD_NAME}.restartCount`"
if [ ! -n "$B_line_null" ]; then #处理有两个restartCount值的pod
B_line=""
else
B_line=`sed -n 2p $WORKSPACE/${POD_NAME}.restartCount`
fi
Last_state=`expr $A_line + $B_line`
## ##获取本次的值
echo "$Pod_Status" |jq -r '.status |.containerStatuses |.[] |.restartCount' > $WORKSPACE/${POD_NAME}.restartCount
A_line=`sed -n 1p $WORKSPACE/${POD_NAME}.restartCount`
B_line_null="`sed -n 2p $WORKSPACE/${POD_NAME}.restartCount`"
if [ ! -n "$B_line_null" ]; then #处理有两个restartCount值的pod
B_line=""
else
B_line=`sed -n 2p $WORKSPACE/${POD_NAME}.restartCount`
fi
Current_state=`expr $A_line + $B_line`
## #对比本次拿到的restartCount值与上此的restartCount值
if [ "$Current_state" -gt "$Last_state" ]; then
Restart_status="Warning restart_count=$Current_state"
else
Restart_status="Normal restart_count=$Current_state"
fi
echo "$Restart_status"
;; running)#监控pod的运行状态和容器的状态返回字符串
if [ ! -n "$Pod_Status" ]; then
echo "New Pod"
exit
fi
running_status=`echo "$Pod_Status" |jq -r '.status |.phase'`
Container_status="`echo "$Pod_Status" |jq -r '.status |.containerStatuses |.[] |.ready' |grep false`"
if [ ! -n "$Container_status" ]; then
Container_status="_true"
else
Container_status="_false"
fi
echo "${running_status}${Container_status}"
;;
running_true)#监控pod中的容器运行状态返回数字
if [ ! -n "$Pod_Status" ]; then
echo "New Pod"
exit
fi
Container_status="`echo "$Pod_Status" |jq -r '.status |.containerStatuses |.[] |.ready' |grep false`"
if [ ! -n "$Container_status" ]; then
Container_status="true"
else
Container_status="false"
fi
if [ "$Container_status" = "true" ]; then
echo ""
else
echo ""
fi
;; *)
echo "Error parameters"
exit
;; esac

zabbix 监控openshift pod状态的更多相关文章

  1. zabbix监控DELL服务器硬件状态

    zabbix监控DELL服务器硬件状态 登录dell服务的管理页面 默认用户名:root 密码:calvin 服务器开放snmp信息,开启完应用 Zabbix服务器导入dell监控硬件模板 验证 sn ...

  2. zabbix监控nginx连接状态(转)

    zabbix监控nginx zabbix可以监控nginx的状态,关于一个服务的状态可以查看服务本身的状态(版本号.是否开启),还应该关注服务能力(例如以nginx的负载效果:连接数.请求数和句柄数) ...

  3. ZABBIX监控mysql主从状态

    模板如下 <zabbix_export> <version>3.4</version> <date>2018-11-30T08:28:28Z</d ...

  4. zabbix监控docker容器状态

    前言:前段时间在部署zabbix,有个需求就是需要监控容器的状态 也就是cpu 内存 io的占用,于是就自己写了一个脚本,以及模板,在这里分享一下 嘿嘿 : ) 废话我也就不多说,直接开始 首选,za ...

  5. zabbix 监控wind登录状态

    参考博文:http://blog.51cto.com/qicheng0211/1694583 需求:监控win 2008 的用户登录状态,无论用户登录成功与否都要告警(也可以刷选指定用户.指定时间内) ...

  6. contos7 使用zabbix监控物理磁盘状态实例

    一.系统环境: 物理机:dell R640 操作系统:centos7 二.安装MegaCli 监控主要是通过MegaCli 软件获取到物理主机的read及硬盘相关状态信息.然后通过zabbix的自定义 ...

  7. zabbix监控nginx日志状态码

    监控需求 监控Nginx常见的状态码并对其进行监控,对常见的错误状态码创建相对应的触发器以下按照分钟对数据进行抓取 Zabbix_Agentd创建监控脚本 1)创建脚本之前核对Nginx日志格式我这里 ...

  8. Zabbix监控mysql主从复制状态

    原理 mysql slave show slave status\G 在输出信息中查看I/O线程和SQL线程的状态值(YES为正常,NO为错误) Slave_IO_Running: Yes Slave ...

  9. 使用zabbix监控mariadb性能状态

    0x01 前言 zabbix内置Mysql的监控模版,因为mariadb和Mysql两者的相关性,所以这个模版也能用在mariadb services上. 0x02 Mysql 首先要在mariadb ...

随机推荐

  1. 【CF717G】Underfail 费用流

    [CF717G]Underfail 题意:赌城拉斯维起司的赌场最近推出了一种新式赌法.它的玩法是由庄家(Joker)设局,赌徒只需要交付一定数额的赌资即可入局.具体地,Joker将给出一个长度为 $n ...

  2. mui---计算缓存大小及清除缓存

    在做APP项目的时候,考虑到APP的的缓存文件太大,会考虑在APP内部设置清除缓存的功能. 具体方法: http://www.dcloud.io/docs/api/zh_cn/cache.html h ...

  3. DOM内容操作

    <table border="2"> <thead id="1" class="c1 c2"> <tr> ...

  4. hdu 3016 Man Down

    题意:给你n个板子,初始100生命,到达每个板子加血或者扣血,求从最上面的板子落到地面的最优解 题解:对于每一个木板,只有从左下或者从右下,所以从下往上来看,到达第n个木板的最优解为 dp[n] = ...

  5. Nginx作为TCP负载均衡

    参考文档:https://www.cnblogs.com/stimlee/p/6243055.html Nginx在1.9版本以后支持TCP负载均衡,模块默认是没有编译的,需要编译时添加—with-s ...

  6. Unable to cast object of type 'System.Int32' to type 'System.Array'.

    x 入职了新公司.最近比较忙...一看博客...更新频率明显少了...罪过罪过... 新公司用ASP.NET MVC 遇上一个错误: Unable to cast object of type 'Sy ...

  7. [No0000114]远程桌面剪贴板无法同步本机,无法复制粘贴问题解决

    远程桌面无法与桌面共享复制内容(远程桌面复制之后,无法在本地桌面粘贴.反之亦然.),这时候需要杀掉一个进程并重新启动.[重启 rdpclip.exe] 1.在远程桌面中右键点击,选择启动任务管理器: ...

  8. Eisenstein's criterion

    https://en.wikipedia.org/wiki/Eisenstein%27s_criterion In mathematics, Eisenstein's criterion gives ...

  9. Chap7:民间用语[《区块链中文词典》维京&甲子]

  10. AndrewNG Deep learning课程笔记

    神经网络基础 Deep learning就是深层神经网络 神经网络的结构如下, 这是两层神经网络,输入层一般不算在内,分别是hidden layer和output layer hidden layer ...