Prometheus K8S中部署Alertmanager
Prometheus K8S中部署Alertmanager

设置告警和通知的主要步骤如下:
一、部署Alertmanager
二、配置Prometheus与Alertmanager通信
三、配置告警
1. prometheus指定rules目录
2. configmap存储告警规则
3. configmap挂载到容器rules目录
一、部署Alertmanager
配置文件
已经修改好的配置文件
- # 存储主配置文件
- alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
# 配置文件名称
name: alertmanager-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
# 告警自定义邮件
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'baojingtongzhi@163.com'
smtp_auth_username: 'baojingtongzhi@163.com'
smtp_auth_password: 'liang123' receivers:
- name: default-receiver
email_configs:
- to: "zhenliang369@163.com" route:
group_interval: 1m
group_wait: 10s
receiver: default-receiver
repeat_interval: 1m配置文件
- # 部署核心组件
- alertmanager-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: kube-system
labels:
k8s-app: alertmanager
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.14.0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: alertmanager
version: v0.14.0
template:
metadata:
labels:
k8s-app: alertmanager
version: v0.14.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
containers:
- name: prometheus-alertmanager
image: "prom/alertmanager:v0.14.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/alertmanager.yml
- --storage.path=/data
- --web.external-url=/
ports:
- containerPort: 9093
readinessProbe:
httpGet:
path: /#/status
port: 9093
initialDelaySeconds: 30
timeoutSeconds: 30
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: storage-volume
mountPath: "/data"
subPath: ""
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
- name: prometheus-alertmanager-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9093/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
volumes:
- name: config-volume
configMap:
name: alertmanager-config
- name: storage-volume
persistentVolumeClaim:
claimName: alertmanager配置文件
- # 使用的自动PV存储
- alertmanager-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
# 使用自己的动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"配置文件
- # 暴露Prot端口
- alertmanager-service.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
# 使用自己的动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
[root@Master1 ~]# cat alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Alertmanager"
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 9093
selector:
k8s-app: alertmanager
type: "ClusterIP"配置文件
部署
1、创建pvc、configmap、deployment、service
kubectl apply -f alertmanager-pvc.yaml
kubectl create -f alertmanager-configmap.yaml
kubectl apply -f alertmanager-deployment.yaml
kubectl apply -f alertmanager-service.yaml
2、查看Pod状态
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
alertmanager-5bb796cb48-fwztv 2/2 Running 0 2m29s
3、查看service状态
kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager ClusterIP 10.0.0.126 <none> 80/TCP
二、配置Prometheus与Alertmanager通信
1、编辑 prometheus-configmap.yaml 配置文件添加绑定信息
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
# 存放prometheus配置文件
prometheus.yml: |
# 配置采集目标
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
# 采集自身
- localhost:9090 prometheus.yml: |
# 配置采集目标
scrape_configs:
- job_name: kubernetes-nodes
static_configs:
- targets:
# 采集自身
- 192.168.1.110:9100
- 192.168.1.111:9100 # 采集:Apiserver 生存指标
# 创建的job name 名称为 kubernetes-apiservers
- job_name: kubernetes-apiservers
# 基于k8s的服务发现
kubernetes_sd_configs:
- role: endpoints
# 使用通信标记标签
relabel_configs:
# 保留正则匹配标签
- action: keep
# 已经包含
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
# 使用方法为https、默认http
scheme: https
tls_config:
# promethus访问Apiserver使用认证
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# 跳过https认证
insecure_skip_verify: true
# promethus访问Apiserver使用认证
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 采集:Kubelet 生存指标
- job_name: kubernetes-nodes-kubelet
kubernetes_sd_configs:
# 发现集群中所有的Node
- role: node
relabel_configs:
# 通过regex获取关键信息
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 采集:nodes-cadvisor 信息
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# 重命名标签
- target_label: __metrics_path__
replacement: /metrics/cadvisor
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 采集:service-endpoints 信息
- job_name: kubernetes-service-endpoints
# 选定指标
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
# 指定源标签
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
# 重命名标签采集
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name # 采集:kubernetes-services 服务指标
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
# 黑盒探测,探测IP与端口是否可用
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
# 使用 blackbox进行黑盒探测
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name # 采集: kubernetes-pods 信息
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
# 只保留采集的信息
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
# 采集地址
- __address__
# 采集端口
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
alerting:
# 告警配置文件
alertmanagers:
# 修改:使用静态绑定
- static_configs:
# 修改:targets、指定地址与端口
- targets: ["alertmanager:80"]
配置文件
2、应用加载配置文件
kubectl apply -f prometheus-configmap.yaml
3、web控制台查看配置是否生效
http://192.168.1.110:42575/config
三、配置告警
1. prometheus指定rules目录
1、编辑 prometheus-configmap.yaml 添加报警信息
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
# 添加:指定读取rules配置
rules_files:
- /etc/config/rules/*.rules
......
配置文件
2、生效配置文件
kubectl apply -f prometheus-configmap.yaml
2. configmap存储告警规则
1、创建yaml文件同过configmap存储告警规则
vim prometheus-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: kube-system
data:
# 通用角色
general.rules: |
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: error
annotations:
summary: "Instance {{ $labels.instance }} 停止工作"
description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."
# Node对所有资源的监控
node.rules: |
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})" - alert: NodeMemoryUsage
expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} 内存使用率过高"
description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})" - alert: NodeCPUUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} CPU使用率过高"
description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"
配置文件
3. configmap挂载到容器rules目录
1、修改挂载点位置,使用之前部署的prometheus动态PV
vim prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
# 部署命名空间
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.2.1
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
# 初始化容器
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi - name: prometheus-server
# 主要使用镜像
image: "prom/prometheus:v2.2.1"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
# 健康检查
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi
# 数据卷
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
# 添加:指定rules的configmap配置文件名称
- name: prometheus-rules
mountPath: /etc/config/rules
subPath: ""
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
# 添加:name rules
- name: prometheus-rules
# 添加:配置文件
configMap:
# 添加:定义文件名称
name: prometheus-rules volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
# 使用动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "16Gi"
配置文件
2、创建configmap并更新PV
kubectl apply -f prometheus-rules.yaml
kubectl apply -f prometheus-statefulset.yaml
3、查看Pod
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
prometheus-0 1/2 Running 0 42s
Prometheus K8S中部署Alertmanager的更多相关文章
- 不使用pvc的方式在K8S中部署apisix-gateway
不使用pvc的方式在K8S中部署apisix-gateway 简介 我的apisix使用etcd作为数据存储服务器,官方的使用pvc方式或者docker-compose的方式,对于新手不太友好,本篇是 ...
- K8S中部署apisix(非ingress)
不使用pvc的方式在K8S中部署apisix-gateway 简介 因为公司项目准备重构,现在做技术储备,之前公司项目使用的ocelot做网关,ocelot是.net平台下的一个网关,也是很不错,但是 ...
- Kubernetes之在k8s中部署Java应用
部署好了k8s以后 部署参考https://www.cnblogs.com/minseo/p/12055731.html 怎么在k8s部署应用 项目迁移到k8s平台是怎样的流程 1,制作镜像 2,控制 ...
- Docker & k8s 系列三:在k8s中部署单个服务实例
本章将会讲解: pod的概念,以及如何向k8s中部署一个单体应用实例. 在上面的篇幅中,我们了解了docker,并制作.运行了docker镜像,然后将镜像发布至中央仓库了.然后又搭建了本机的k8s环境 ...
- k8s中部署springcloud
安装和配置数据存储仓库MySQL 1.MySQL简介 2.MySQL特点 3.安装和配置MySQL 4.在MySQL数据库导入数据 5.对MySQL数据库进行授权 1.MySQL简介 MySQL 是一 ...
- 在k8s中部署前后端分离项目进行访问的两种配置方式
第一种方式 (1) nginx配置中只写前端项目的/根路径配置 前端项目使用的Dockerfile文件内容 把前端项目编译后生成的dist文件夹放在nginx的html默认目录下,浏览器访问前端项目时 ...
- 【转】K8S中部署Helm
K8S中的包管理工具 1. 客户端Helm(即Helm) 通过脚本安装:curl https://raw.githubusercontent.com/helm/helm/master/scripts ...
- k8s集群中部署prometheus server
1.概述 本文档主要介绍如何在k8s集群中部署prometheus server用来作为监控的数据采集服务器,这样做可以很方便的对k8s集群中的指标.pod的.节点的指标进行采集和监控. 2.下载镜像 ...
- Prometheus K8S部署
Prometheus K8S部署 部署方式:https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus ...
随机推荐
- 2018-8-10-win10-uwp-关联文件
原文:2018-8-10-win10-uwp-关联文件 title author date CreateTime categories win10 uwp 关联文件 lindexi 2018-08-1 ...
- python基础(34):线程(二)
1. python线程 1.1 全局解释器锁GIL Python代码的执行由Python虚拟机(也叫解释器主循环)来控制.Python在设计之初就考虑到要在主循环中,同时只有一个线程在执行.虽然 Py ...
- Java技巧——比较两个日期相差的天数
Java技巧——比较两个日期相差的天数 摘要:本文主要记录了在Java里面如何判断两个日期相差的天数. 判断两个Date类型的日期之间的天数 通过计算毫秒数判断: public static void ...
- C#WinForm解决跨线程访问控件属性报错
方式一(在程序初始化构造函数中加一行代码): public Form1() { InitializeComponent(); Control.CheckForIllegalCrossThreadCal ...
- PHP mysqli_kill MySQLi 函数
mysqli_kill - 让服务器杀掉一个 MySQL 线程 语法:mysqli_kill ( mysqli $link , int $processid ) 本函数可以用来让服务器杀掉 proce ...
- RV32FDQ/RV64RDQ指令集(2)
下面我们逐个看下每个指令的细节: fadd.s fadd.s rd, rs1, rs2 //f [rd] = f [rs1] + f [rs2]单精度浮点加(Floating-point Ad ...
- Xcode 11新建工程.--iOS 13 SceneDelegate适配
收录文章::::::::::::::: iOS 13 适配要点总结 在Xcode 11 创建的工程,运行设备选择 iOS 13.0 以下的设备,运行应用时会出现黑屏现象.原因: Xcode 11 默认 ...
- Autofac 应用于IIS托管的WEB程序,注册程序集被回收的问题
现项目开始全面接入Autofac,但上线了后发现,iis进程被回收后,在访问网页提示找不到注册在Autofac中的类型,或者实例.现在处理办法记录如下: 1. IIS托管的应用程序,在首次加载时,所有 ...
- classmethod,staticmethod,反射,魔法方法,单例模式
目录 classmethod staticmethod instance issubclass 反射 hasatter getatter setatter delatter 魔法方法 单例模式 什么是 ...
- 【简单的spfa+优先队列】
题目是给出只有x和y构成的图,相同元素走路不花费,不同元素间花费1,给出起点终点,最少花费是 #include<cstdio>#include<algorithm>#inclu ...