cAdvisor容器监控规则
其他说明参考host主机监控规则:https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html
在prometheus主程序目录下的rules目录下新建docker.yml文件,添加上如下内容,然后重启prometheus。
groups:
- name: Docker containers monitoring
rules:
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 5m
labels:
severity: warning
annotations:
summary: "Container killed (instance {{ $labels.instance }})"
description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container CPU usage (instance {{ $labels.instance }})"
description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Memory usage (instance {{ $labels.instance }})"
description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeUsage
expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume usage (instance {{ $labels.instance }})"
description: "Container Volume usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume IO usage (instance {{ $labels.instance }})"
description: "Container Volume IO usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Container high throttle rate (instance {{ $labels.instance }})"
description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerActiveConnectinos
expr: pgbouncer_pools_server_active_connections > 200
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"
description: "PGBouncer pools are filling up\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerErrors
expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer errors (instance {{ $labels.instance }})"
description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerMaxConnections
expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "PGBouncer max connections (instance {{ $labels.instance }})"
description: "The number of PGBouncer client connections has reached max_client_conn.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqQueueSize
expr: sidekiq_queue_size{} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Sidekiq queue size (instance {{ $labels.instance }})"
description: "Sidekiq queue {{ $labels.name }} is growing\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqSchedulingLatencyTooHigh
expr: max(sidekiq_queue_latency) > 120
for: 5m
labels:
severity: critical
annotations:
summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"
description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulServiceHealthcheckFailed
expr: consul_catalog_service_node_healthy == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"
description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulMissingMasterNode
expr: consul_raft_peers < 3
for: 5m
labels:
severity: critical
annotations:
summary: "Consul missing master node (instance {{ $labels.instance }})"
description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulAgentUnhealthy
expr: consul_health_node_status{status="critical"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Consul agent unhealthy (instance {{ $labels.instance }})"
description: "A Consul agent is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
cAdvisor容器监控规则的更多相关文章
- 容器监控:cadvisor+influxdb+grafana
cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...
- docker stack 部署容器监控方案(cAdvisor、Prometheus、Grafana)
=============================================== 2018/7/8_第1次修改 ccb_warlock === ...
- Docker进阶-容器监控cAdvisor+InfluxDB+Granfana
概述 前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...
- 你必须知道的容器监控 (2) cAdvisor
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...
- docker容器监控:cadvisor+influxdb+grafana
cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...
- 容器监控:cAdvisor
CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...
- 【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控
本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程 三个都直接下 ...
- 你必须知道的容器监控 (3) Prometheus
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...
- Docker系列08:容器监控
1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...
随机推荐
- DHCP原理及配置
DHCP工作原理 集中的管理.分配IP地址,使client动态的获得IP地址.Gateway地址.DNS服务器地址等信息,并能够提升地址的使用率. 简单来说,DHCP就是一个不需要账号密码登录的.自动 ...
- ACWing93.递归实现组合型枚举
题面 \93. 递归实现组合型枚举 从 1∼n 这 n 个整数中随机选出 m 个,输出所有可能的选择方案. 输入格式 两个整数 n,m ,在同一行用空格隔开. 输出格式 按照从小到大的顺序输出所有方案 ...
- python不同平台进程的启动与终止
Liunx进程的启动与终止 在使用subprocess创建进程时需要将所有进程设置为一个进程组 preexec_fn:只在 Unix 平台下有效,用于指定一个可执行对象(callable object ...
- 清北学堂 2020 国庆J2考前综合强化 Day7
目录 1. 题目 T1 魔力石 题目描述 Sol T2 和 题目描述 Sol T3 数对 题目描述 Sol T4 海豹王国 题目描述 Sol 考场策略 1. 题目 T1 魔力石 题目描述 题目描述 小 ...
- linux文件校验
最近在一次安装centos7程序中遇到了网速很卡的情况,不得已采用了百度云的离线下载功能,后来上传进入虚拟机内,结果遇到无法上传的情况,后来经过转码后才上传成功,详情http://www.cnblog ...
- Ray类定义
定义光线,书中已经有原理. 类声明: #ifndef __RAY_HEADER__ #define __RAY_HEADER__ #include "geometry.h" cla ...
- mybatis 01: 静态代理 + jdk动态代理
背景 有时目标对象不可直接访问,只能通过代理对象访问 图示: 示例1: 房东 ===> 目标对象 房屋中介 ===> 代理对象 你,我 ===> 客户端对象 示例2: 运营商(电信, ...
- Luogu3275 [SCOI2011]糖果 (差分约束)
逆序建超级源快十倍还行 #include <cstdio> #include <iostream> #include <cstring> #include < ...
- Unity3D学习笔记12——渲染纹理
目录 1. 概述 2. 详论 3. 问题 1. 概述 在文章<Unity3D学习笔记11--后处理>中论述了后处理是帧缓存(Framebuffer)技术实现之一:而另外一个帧缓存技术实现就 ...
- 关于 Excel 函数对字符、布尔、数字的运算的细节
使用函数时,通过引用单元格作为参数进行运算,不会计算字符和布尔. 假如 A1.B1.C1 这三个单元格,其中 A1 为布尔TRUE:B1 为字符"2":C1 为数字1. 求和函数= ...