[k8s]prometheus+alertmanager二进制安装实现简单邮件告警
- 本次任务是用alertmanaer发一个报警邮件
- 本次环境采用二进制普罗组件
- 本次准备监控一个节点的内存,当使用率大于2%时候(测试),发邮件报警.
环境准备
下载二进制https://prometheus.io/download/
https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.windows-amd64.tar.gz
https://github.com/prometheus/alertmanager/releases/download/v0.12.0/alertmanager-0.12.0.windows-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz
解压
/root/
├── alertmanager -> alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64.tar.gz
├── node_exporter-0.15.2.linux-amd64
├── node_exporter-0.15.2.linux-amd64.tar.gz
├── prometheus -> prometheus-2.0.0.linux-amd64
├── prometheus-2.0.0.linux-amd64
└── prometheus-2.0.0.linux-amd64.tar.gz
实验架构
配置alertmanager
创建 alert.yml
[root@n1 alertmanager]# ls
alertmanager alert.yml amtool data LICENSE NOTICE simple.yml
alert.yml 里面定义下: 谁发送 什么事件 发给谁 怎么发等.
cat alert.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'maotai@163.com'
smtp_auth_username: 'maotai@163.com'
smtp_auth_password: '123456'
templates:
- '/root/alertmanager/template/*.tmpl'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver
receivers:
- name: 'default-receiver'
email_configs:
- to: 'maotai@foxmail.com'
- 配置好后启动即可
./alertmanager -config.file=./alert.yml
配置prometheus
报警规则rule.yml配置(将被prometheus.yml调用)
当使用率大于2%时候(测试),发邮件报警
$ cat rule.yml
groups:
- name: test-rule
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
for: 1m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
关键在于这个公式
(node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
labels 给这个规则打个标签
annotations(报警说明)这部分是报警内容
监控k从哪里获取?(后面有说) node_memory_MemTotal/node_memory_Buffers/node_memory_Cached
prometheus.yml配置
添加node_expolore这个job
添加rule_files的报警规则,rule_files部分调用rule.yml
$ cat prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
rule_files:
- /root/prometheus/rule.yml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.14.11:9090']
- job_name: linux
static_configs:
- targets: ['192.168.14.11:9100']
labels:
instance: db1
配置好后启动普罗然后访问,可以看到了node target了.
查看node_explore抛出的metric
查看alert,可以看到告警规则发生的状态
这些公式的key从这里可以看到(前提是当你安装了对应的explore),按照这个k来写告警公式
查看收到的邮件
微信报警配置
global:
# The smarthost and SMTP sender used for mail notifications.
resolve_timeout: 6m
smtp_smarthost: '172.16.100.14:25'
smtp_from: 'svnbuild_yf@iflytek.com'
smtp_auth_username: 'svnbuild_yf'
smtp_auth_password: 'tag#write@2015313'
smtp_require_tls: false
# The auth token for Hipchat.
hipchat_auth_token: '1234556789'
# Alternative host for Hipchat.
hipchat_api_url: 'https://hipchat.foobar.org/'
wechat_api_url: "https://qyapi.weixin.qq.com/cgi-bin/"
wechat_api_secret: "4tQroVeB0xUcccccccc65Yfkj2Nkt90a80MH3ayI"
wechat_api_corp_id: "wxaf5acxxxx5f8eb98"
# The directory from which notification templates are read.
templates:
- 'templates/*.tmpl'
# The root route on which each incoming alert enters.
route:
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ['alertname']
# When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 3s
# When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 5m
# If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 1h
# A default receiver
receiver: ybyang2
routes:
- match:
job: "11"
#service: "node_exporter"
routes:
- match:
status: yellow
receiver: ybyang2
- match:
status: orange
receiver: berlin
# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
service: 'up'
target_match:
service: 'mysql'
# Apply inhibition if the alerqtname is the same.
equal: ["instance"]
- source_match:
service: "mysql"
target_match:
service: "mysql-query"
equal: ['instance']
- source_match:
service: "A"
target_match:
service: "B"
equal: ["instance"]
- source_match:
service: "B"
target_match:
service: "C"
equal: ["instance"]
receivers:
- name: 'ybyang2'
email_configs:
- to: 'ybyang2@iflytek.com'
send_resolved: true
html: '{{ template "email.default.html" . }}'
headers: { Subject: "[mail] 测试技术部监控告警邮件" }
- name: "berlin"
wechat_configs:
- send_resolved: true
to_user: "@all"
to_party: ""
to_tag: ""
agent_id: "1"
corp_id: "wxaf5a99ccccc5f8eb98"
[k8s]prometheus+alertmanager二进制安装实现简单邮件告警的更多相关文章
- prometheus + grafana + node_exporter + alertmanager 的安装部署与邮件报警 (一)
大家一定要先看详细的理论教程,再开始搭建,这样报错后才容易找到突破口 参考文档 https://www.cnblogs.com/afterdawn/p/9020129.html https://www ...
- influxDB+grafana安装配置及邮件告警发送配置
1. InfluxDB安装 下载包并解压: $:wgethttps://dl.influxdata.com/influxdb/releases/influxdb-1.3.6_linux_amd64.t ...
- cadvisor+prometheus+alertmanager+grafana完成容器化监控告警(一)
一.概况 1.拓扑图 2.名词解释 Grafana 可视化监控容器运行情况 Prometheus: 开源系统监视和警报工具包 Alertmanager 一个独立的组件,负责接收并处理来自Prometh ...
- jmx_prometheus_javaagent+prometheus+alertmanager+grafana完成容器化java监控告警(二)
一.拓扑图 二.收集数据 2.1前期准备 创建共享目录,即为了各节点都创建该目录,有两个文件,做数据共享 /home/target/prom-jvm-demo 1.下载文件 jmx_prometheu ...
- kubernetes实战(二十):k8s一键部署高可用Prometheus并实现邮件告警
1.基本概念 本次部署使用的是CoreOS的prometheus-operator. 本次部署包含监控etcd集群. 本次部署适用于二进制和kubeadm安装方式. 本次部署适用于k8s v1.10版 ...
- kubernetes(k8s) Prometheus+grafana监控告警安装部署
主机数据收集 主机数据的采集是集群监控的基础:外部模块收集各个主机采集到的数据分析就能对整个集群完成监控和告警等功能.一般主机数据采集和对外提供数据使用cAdvisor 和node-exporter等 ...
- Prometheus + AlertManager 邮件报警
安装 wget https://github.com/prometheus/alertmanager/releases/download/v0.13.0/alertmanager-0.13.0.lin ...
- Prometheus 监控报警系统 AlertManager 之邮件告警
转载自:https://cloud.tencent.com/developer/article/1486483 文章目录1.Prometheus & AlertManager 介绍2.环境.软 ...
- [k8s]prometheus+grafana监控node和mysql(普罗/grafana均vm安装)
https://github.com/prometheus/prometheus Architecture overview Prometheus Server Prometheus Server 负 ...
随机推荐
- C#.NET常见问题(FAQ)-如何捕捉窗体关闭的事件,弹窗确认是否退出
首先定位到窗体的FormClosing事件中,写关闭之前要执行的方法名称 一般只需要添加下面的代码即可实现窗体关闭的时候提示是否确认退出 //捕捉窗体Close事件,关闭窗口时提示 if (Mes ...
- 浅析Android线程模型一 --- 转
摘要:随着中国移动在8月份相继发布基于Google Android的OPhone平台和手机网上应用商店Mobile Market,以及各大手机生产厂商在2009年北京国际通信展?上展出了各自基于And ...
- iPhone调用ffmpeg2.0.2解码h264视频的示例代码
iPhone调用ffmpeg2.0.2解码h264视频的示例代码 h264demo.zip 关于怎么在MAC下编译iOS下的ffmpeg请看 编译最新ffmpeg2.0.1(ffmpeg2.0.2)到 ...
- MSP430F5438 I2C学习笔记——AT24C02
0.前言 对于大多数单片机来说,I2C成了一个老大难问题.从51时代开始,软件模拟I2C成了主流,甚至到ARMCortex M3大行其道的今天,软件模拟I2C依然是使用最广的方法.虽然软件模拟可以解决 ...
- Java从零开始学四十一(反射简述二)
一.实例化Class类对象 实例化Class类对象的方法有三种: 第一种:通过forName()方法 第二种:类.class 第三种:对象.getClass() 二.Class类的常用方法 No. 方 ...
- VB中将INT型转换成STRING和从STRING转换成INT型的函数
CStr 函数示例本示例使用 CStr 函数将一数值转换为 String. Dim MyDouble, MyStringMyDouble = 437.324 ' MyDouble 为 Double ...
- ng-src 的坑
问题: <ion-slide ng-repeat="item in bannrImgData" ng-click="getActivity($index)" ...
- 解决Failure to transfer org.apache.maven.plugins:maven-surefire-plugin:pom:2.12.4
Failure to transfer org.apache.maven.plugins:maven-surefire-plugin:pom:2.12.4 from http://uk.maven.o ...
- 成为JavaGC专家(3)—如何监控Java垃圾回收机制(转载)
原文:http://www.importnew.com/3146.html 为什么需要优化GC 或者说的更确切一些,对于基于Java的服务,是否有必要优化GC?应该说,对于所有的基于Java的服务,并 ...
- js时间转换,能够把时间转换成yyyymmdd格式或yyyymm格式
//type为1则转换成yyyymmdd格式,type为2则转换成yyyymm格式 function formatTime(time,type){ var temp_time=new Number(t ...