• 本次任务是用alertmanaer发一个报警邮件
  • 本次环境采用二进制普罗组件
  • 本次准备监控一个节点的内存,当使用率大于2%时候(测试),发邮件报警.

k8s集群使用普罗官方文档

环境准备

下载二进制https://prometheus.io/download/

https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.windows-amd64.tar.gz
https://github.com/prometheus/alertmanager/releases/download/v0.12.0/alertmanager-0.12.0.windows-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz

解压

/root/
├── alertmanager -> alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64.tar.gz
├── node_exporter-0.15.2.linux-amd64
├── node_exporter-0.15.2.linux-amd64.tar.gz
├── prometheus -> prometheus-2.0.0.linux-amd64
├── prometheus-2.0.0.linux-amd64
└── prometheus-2.0.0.linux-amd64.tar.gz

实验架构

配置alertmanager

创建 alert.yml

[root@n1 alertmanager]# ls
alertmanager alert.yml amtool data LICENSE NOTICE simple.yml

alert.yml 里面定义下: 谁发送 什么事件 发给谁 怎么发等.

cat alert.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'maotai@163.com'
smtp_auth_username: 'maotai@163.com'
smtp_auth_password: '123456' templates:
- '/root/alertmanager/template/*.tmpl' route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver receivers:
- name: 'default-receiver'
email_configs:
- to: 'maotai@foxmail.com' - 配置好后启动即可
./alertmanager -config.file=./alert.yml

配置prometheus

报警规则rule.yml配置(将被prometheus.yml调用)

当使用率大于2%时候(测试),发邮件报警

$ cat rule.yml
groups:
- name: test-rule
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
for: 1m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"

关键在于这个公式

(node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2

labels 给这个规则打个标签

annotations(报警说明)这部分是报警内容

监控k从哪里获取?(后面有说) node_memory_MemTotal/node_memory_Buffers/node_memory_Cached

prometheus.yml配置

  • 添加node_expolore这个job

  • 添加rule_files的报警规则,rule_files部分调用rule.yml

$ cat prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"] rule_files:
- /root/prometheus/rule.yml scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.14.11:9090']
- job_name: linux
static_configs:
- targets: ['192.168.14.11:9100']
labels:
instance: db1

配置好后启动普罗然后访问,可以看到了node target了.

查看node_explore抛出的metric

查看alert,可以看到告警规则发生的状态

这些公式的key从这里可以看到(前提是当你安装了对应的explore),按照这个k来写告警公式

查看收到的邮件



微信报警配置

global:
# The smarthost and SMTP sender used for mail notifications.
resolve_timeout: 6m
smtp_smarthost: '172.16.100.14:25'
smtp_from: 'svnbuild_yf@iflytek.com'
smtp_auth_username: 'svnbuild_yf'
smtp_auth_password: 'tag#write@2015313'
smtp_require_tls: false # The auth token for Hipchat.
hipchat_auth_token: '1234556789'
# Alternative host for Hipchat.
hipchat_api_url: 'https://hipchat.foobar.org/'
wechat_api_url: "https://qyapi.weixin.qq.com/cgi-bin/"
wechat_api_secret: "4tQroVeB0xUcccccccc65Yfkj2Nkt90a80MH3ayI"
wechat_api_corp_id: "wxaf5acxxxx5f8eb98" # The directory from which notification templates are read.
templates:
- 'templates/*.tmpl' # The root route on which each incoming alert enters.
route:
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
group_by: ['alertname'] # When a new group of alerts is created by an incoming alert, wait at
# least 'group_wait' to send the initial notification.
# This way ensures that you get multiple alerts for the same group that start
# firing shortly after another are batched together on the first
# notification.
group_wait: 3s # When the first notification was sent, wait 'group_interval' to send a batch
# of new alerts that started firing for that group.
group_interval: 5m # If an alert has successfully been sent, wait 'repeat_interval' to
# resend them.
repeat_interval: 1h # A default receiver
receiver: ybyang2 routes:
- match:
job: "11"
#service: "node_exporter"
routes:
- match:
status: yellow
receiver: ybyang2
- match:
status: orange
receiver: berlin # Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
service: 'up'
target_match:
service: 'mysql'
# Apply inhibition if the alerqtname is the same.
equal: ["instance"] - source_match:
service: "mysql"
target_match:
service: "mysql-query"
equal: ['instance'] - source_match:
service: "A"
target_match:
service: "B"
equal: ["instance"] - source_match:
service: "B"
target_match:
service: "C"
equal: ["instance"] receivers:
- name: 'ybyang2'
email_configs:
- to: 'ybyang2@iflytek.com'
send_resolved: true
html: '{{ template "email.default.html" . }}'
headers: { Subject: "[mail] 测试技术部监控告警邮件" } - name: "berlin"
wechat_configs:
- send_resolved: true
to_user: "@all"
to_party: ""
to_tag: ""
agent_id: "1"
corp_id: "wxaf5a99ccccc5f8eb98"

[k8s]prometheus+alertmanager二进制安装实现简单邮件告警的更多相关文章

  1. prometheus + grafana + node_exporter + alertmanager 的安装部署与邮件报警 (一)

    大家一定要先看详细的理论教程,再开始搭建,这样报错后才容易找到突破口 参考文档 https://www.cnblogs.com/afterdawn/p/9020129.html https://www ...

  2. influxDB+grafana安装配置及邮件告警发送配置

    1. InfluxDB安装 下载包并解压: $:wgethttps://dl.influxdata.com/influxdb/releases/influxdb-1.3.6_linux_amd64.t ...

  3. cadvisor+prometheus+alertmanager+grafana完成容器化监控告警(一)

    一.概况 1.拓扑图 2.名词解释 Grafana 可视化监控容器运行情况 Prometheus: 开源系统监视和警报工具包 Alertmanager 一个独立的组件,负责接收并处理来自Prometh ...

  4. jmx_prometheus_javaagent+prometheus+alertmanager+grafana完成容器化java监控告警(二)

    一.拓扑图 二.收集数据 2.1前期准备 创建共享目录,即为了各节点都创建该目录,有两个文件,做数据共享 /home/target/prom-jvm-demo 1.下载文件 jmx_prometheu ...

  5. kubernetes实战(二十):k8s一键部署高可用Prometheus并实现邮件告警

    1.基本概念 本次部署使用的是CoreOS的prometheus-operator. 本次部署包含监控etcd集群. 本次部署适用于二进制和kubeadm安装方式. 本次部署适用于k8s v1.10版 ...

  6. kubernetes(k8s) Prometheus+grafana监控告警安装部署

    主机数据收集 主机数据的采集是集群监控的基础:外部模块收集各个主机采集到的数据分析就能对整个集群完成监控和告警等功能.一般主机数据采集和对外提供数据使用cAdvisor 和node-exporter等 ...

  7. Prometheus + AlertManager 邮件报警

    安装 wget https://github.com/prometheus/alertmanager/releases/download/v0.13.0/alertmanager-0.13.0.lin ...

  8. Prometheus 监控报警系统 AlertManager 之邮件告警

    转载自:https://cloud.tencent.com/developer/article/1486483 文章目录1.Prometheus & AlertManager 介绍2.环境.软 ...

  9. [k8s]prometheus+grafana监控node和mysql(普罗/grafana均vm安装)

    https://github.com/prometheus/prometheus Architecture overview Prometheus Server Prometheus Server 负 ...

随机推荐

  1. Android中startService的使用及Service生命周期

    Android中有两种主要方式使用Service,通过调用Context的startService方法或调用Context的bindService方法.本文仅仅探讨纯startService的使用.不 ...

  2. 在MyEclipse中配置Weblogic10服务器

    MyEclipse中配置Weblogic10服务器 在MyEclipse中配置Weblogic10服务器也是很简单,现在将过程分享给有需要的人. 1.在下方的Server选项卡中,鼠标右键选择“Con ...

  3. linux more 上一页,下一页

    linux more 上一页,下一页 使用more命令可以分页查看内容: 如: more install.txt 分页查看文本内容: 按回车:默认下一行数据: 按空格键盘,默认下一页,以当前屏幕为单位 ...

  4. java线程(上)Thread和Runnable的区别

    首先讲一下进程和线程的区别: 进程:每个进程都有独立的代码和数据空间(进程上下文),进程间的切换会有较大的开销,一个进程包含1--n个线程. 线程:同一类线程共享代码和数据空间,每个线程有独立的运行栈 ...

  5. Memcachedclient-XMemcached使用

    一. XMemcached 简单介绍 XMemcached 是一个新 java memcached client . 或许你还不知道 memcached 是什么?能够先看看这里.简单来说, Memca ...

  6. 1、redis之安装与配置

    下载安装: redis-server.exe redis服务器的daemon启动程序 redis.conf redis配置文件 redis-cli.exe redis命令行操作工具.当然,也可以用te ...

  7. servlet 服务器HTTP响应头设置示例(response用法)

    1,Location 用于重定向,和返回状态码302结合使用. 代码示例: response.setStatus(302); response.setHeader("location&quo ...

  8. Synchronized和Lock, 以及自旋锁 Spin Lock, Ticket Spin Lock, MCS Spin Lock, CLH Spin Lock

    Synchronized和Lock synchronized是一个关键字, Lock是一个接口, 对应有多种实现. 使用synchronized进行同步和使用Lock进行同步的区别 使用synchro ...

  9. cout printf 莫明奇妙的崩溃问题

    出现异常主要表现 导致异常的关键代码不是因为printf 或cout,而是因为使用栈空间超出的原因 下图试图在栈上分配1024000个char的空间,确发现崩溃 的位置是printf,这就是这个问题难 ...

  10. c++课程设计(日历)

    今天比较无聊,就随便找了个程序设计来做,下面是源代码,以及效果图...不喜请喷!/*题目1:年历显示. 功能要求: (1) 输入一个年份,输出是在屏幕上显示该年的日历.假定输入的年份在1940-204 ...