cAdvisor容器监控规则
其他说明参考host主机监控规则:https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html
在prometheus主程序目录下的rules目录下新建docker.yml文件,添加上如下内容,然后重启prometheus。
groups:
- name: Docker containers monitoring
rules:
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 5m
labels:
severity: warning
annotations:
summary: "Container killed (instance {{ $labels.instance }})"
description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container CPU usage (instance {{ $labels.instance }})"
description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Memory usage (instance {{ $labels.instance }})"
description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeUsage
expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume usage (instance {{ $labels.instance }})"
description: "Container Volume usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume IO usage (instance {{ $labels.instance }})"
description: "Container Volume IO usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Container high throttle rate (instance {{ $labels.instance }})"
description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerActiveConnectinos
expr: pgbouncer_pools_server_active_connections > 200
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"
description: "PGBouncer pools are filling up\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerErrors
expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer errors (instance {{ $labels.instance }})"
description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerMaxConnections
expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "PGBouncer max connections (instance {{ $labels.instance }})"
description: "The number of PGBouncer client connections has reached max_client_conn.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqQueueSize
expr: sidekiq_queue_size{} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Sidekiq queue size (instance {{ $labels.instance }})"
description: "Sidekiq queue {{ $labels.name }} is growing\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqSchedulingLatencyTooHigh
expr: max(sidekiq_queue_latency) > 120
for: 5m
labels:
severity: critical
annotations:
summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"
description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulServiceHealthcheckFailed
expr: consul_catalog_service_node_healthy == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"
description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulMissingMasterNode
expr: consul_raft_peers < 3
for: 5m
labels:
severity: critical
annotations:
summary: "Consul missing master node (instance {{ $labels.instance }})"
description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulAgentUnhealthy
expr: consul_health_node_status{status="critical"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Consul agent unhealthy (instance {{ $labels.instance }})"
description: "A Consul agent is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
cAdvisor容器监控规则的更多相关文章
- 容器监控:cadvisor+influxdb+grafana
cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...
- docker stack 部署容器监控方案(cAdvisor、Prometheus、Grafana)
=============================================== 2018/7/8_第1次修改 ccb_warlock === ...
- Docker进阶-容器监控cAdvisor+InfluxDB+Granfana
概述 前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...
- 你必须知道的容器监控 (2) cAdvisor
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...
- docker容器监控:cadvisor+influxdb+grafana
cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...
- 容器监控:cAdvisor
CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...
- 【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控
本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程 三个都直接下 ...
- 你必须知道的容器监控 (3) Prometheus
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...
- Docker系列08:容器监控
1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...
随机推荐
- HTML及HTTP协议
web服务的过程: 浏览器发请求 --> HTTP协议 --> 服务端接收请求 --> 服务端返回响应 --> 服务端把HTML文件内容发给浏览器 --> 浏览器渲染页面 ...
- 【Unity基础知识】认识常用的生命周期函数(Awake、Start、Update...)
一.了解帧的概念 游戏的本质就是一个死循环 每一次循环都会处理游戏逻辑 并 更新一次游戏画面 之所以能看到画面在动 是因为 切换画面速度达到一定速度时 人眼就会认为画面是动态且流畅的 一帧就是执行了一 ...
- 浅谈spring-createBean
目录 找到BeanClass并且加载类 实例化前 实例化 Supplier创建对象 工厂方法创建对象 方法一 方法二 推断构造方法 BeanDefionition 的后置处理 实例化后 属性填充 sp ...
- 搭建一个完整的K8S集群-------基于CentOS 8系统
创建三个centos节点: 192.168.5.141 k8s-master 192.168.5.142 k8s-nnode1 192.168.5.143 k8s-nnode2 查看centos系统版 ...
- 使用.NET简单实现一个Redis的高性能克隆版(四、五)
译者注 该原文是Ayende Rahien大佬业余自己在使用C# 和 .NET构建一个简单.高性能兼容Redis协议的数据库的经历. 首先这个"Redis"是非常简单的实现,但是他 ...
- React重新渲染指南
前言 老早就想写一篇关于React渲染的文章,这两天看到一篇比较不错英文的文章,翻译一下(主要是谷歌翻译,手动狗头),文章底部会附上原文链接. 介绍 React 重新渲染的综合指南.该指南解释了什么是 ...
- Three---面向对象与面向过程/属性和变量/关于self/一些魔法方法的使用/继承/super方法/多态
python的面向对象 面向对象与面向过程 面向过程 面向过程思想:需要实现一个功能的时候,看重的是开发的步骤和过程,每一个步骤都需要自己亲力亲为,需要自己编写代码(自己来做) 面向对象 面向对象的三 ...
- 2步就可以压缩PPT大小,再也不怕C盘飘红了!
在座哪位小朋友的C盘已经红了,举个手让我看看! 嗯......还真不少啊! 经常做PPT的同学已经开始抱怨了:领导给的图片一张就10M起,一个PPT里面百来张图,文件大小都快1个G了. 如果是文秘岗, ...
- 【java】学习路径31-文件IO基本操作(未涉及到流)
一.初始化: File f1 = new File("//Users//Shared//JavaIOTest//Test01.txt"); File f2 = new File(& ...
- html、css实现导航栏5种常用下拉效果
实现的效果:鼠标移入按钮时按钮中的内容就会出现,分别展示不同的出现效果.效果难点:不使用JavaScript,那这个效果的难点就是在于:hover伪类的掌控,以及考验对html的结构掌握. 1. ht ...