cAdvisor容器监控规则
其他说明参考host主机监控规则:https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html
在prometheus主程序目录下的rules目录下新建docker.yml文件,添加上如下内容,然后重启prometheus。
groups:
- name: Docker containers monitoring
rules:
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 5m
labels:
severity: warning
annotations:
summary: "Container killed (instance {{ $labels.instance }})"
description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container CPU usage (instance {{ $labels.instance }})"
description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Memory usage (instance {{ $labels.instance }})"
description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeUsage
expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume usage (instance {{ $labels.instance }})"
description: "Container Volume usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume IO usage (instance {{ $labels.instance }})"
description: "Container Volume IO usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Container high throttle rate (instance {{ $labels.instance }})"
description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerActiveConnectinos
expr: pgbouncer_pools_server_active_connections > 200
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"
description: "PGBouncer pools are filling up\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerErrors
expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer errors (instance {{ $labels.instance }})"
description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerMaxConnections
expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "PGBouncer max connections (instance {{ $labels.instance }})"
description: "The number of PGBouncer client connections has reached max_client_conn.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqQueueSize
expr: sidekiq_queue_size{} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Sidekiq queue size (instance {{ $labels.instance }})"
description: "Sidekiq queue {{ $labels.name }} is growing\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqSchedulingLatencyTooHigh
expr: max(sidekiq_queue_latency) > 120
for: 5m
labels:
severity: critical
annotations:
summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"
description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulServiceHealthcheckFailed
expr: consul_catalog_service_node_healthy == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"
description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulMissingMasterNode
expr: consul_raft_peers < 3
for: 5m
labels:
severity: critical
annotations:
summary: "Consul missing master node (instance {{ $labels.instance }})"
description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulAgentUnhealthy
expr: consul_health_node_status{status="critical"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Consul agent unhealthy (instance {{ $labels.instance }})"
description: "A Consul agent is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
cAdvisor容器监控规则的更多相关文章
- 容器监控:cadvisor+influxdb+grafana
cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...
- docker stack 部署容器监控方案(cAdvisor、Prometheus、Grafana)
=============================================== 2018/7/8_第1次修改 ccb_warlock === ...
- Docker进阶-容器监控cAdvisor+InfluxDB+Granfana
概述 前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...
- 你必须知道的容器监控 (2) cAdvisor
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...
- docker容器监控:cadvisor+influxdb+grafana
cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...
- 容器监控:cAdvisor
CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...
- 【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控
本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程 三个都直接下 ...
- 你必须知道的容器监控 (3) Prometheus
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...
- Docker系列08:容器监控
1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...
随机推荐
- Linux(Centos7) 实例搭建 FTP 服务
本文以 CentOS 7.2 64位系统为例,使用 vsftpd 作为 FTP 服务端,FileZilla 作为客户端.指导您如何在 Linux 云服务器上搭建 FTP 服务. 操作步骤 安装 vsf ...
- 丽泽普及2022交流赛day16 社论
这场比较平凡吧 . 省流: http://zhengruioi.com/contest/1087 目录 目录 A. Gene 题面 题解 算法一(正解) 算法二 B. Fight 题面 题解 算法一( ...
- java实现wordCount的map
打开IDEA,File--new --Project,新建一个项目 我们已经安装好了maven,不用白不用 这里不要选用骨架,Next.在写上Groupid,Next. 写上项目名称,finish.o ...
- Kafka与Spark案例实践
1.概述 Kafka系统的灵活多变,让它拥有丰富的拓展性,可以与第三方套件很方便的对接.例如,实时计算引擎Spark.接下来通过一个完整案例,运用Kafka和Spark来合理完成. 2.内容 2.1 ...
- google nexus5x 刷机抓包逆向环境配置(三)
本文仅供学习交流使用,如侵立删! google nexus5x 刷机抓包逆向环境配置(三) 安装抓包证书(Fiddler.Charles) 操作环境 nexus5x kaliLinux win10 准 ...
- 如何构建 Apache DolphinScheduler 的 Docker 镜像
继昨日发布第一个 [官方 Docker 镜像] 后,有几位小伙伴私信想自己进行编译,这里也将 Docker 的主要贡献者文禾同学整理的文档进行分享.以下是全文内容: 您能够在类 Unix 系统和 Wi ...
- 一个注解搞定SpringBoot接口定制属性加解密
前言 上个月公司另一个团队做的新项目上线后大体上运行稳定,但包括研发负责人在内的两个人在项目上线后立马就跳槽了,然后又交接给了我这个「垃圾回收人员」. 本周甲方另一个厂家的监控平台扫描到我们这个项目某 ...
- C# 创建标签PDF文件
Q1:关于"标签PDF文件(Tagged PDF)" 标签PDF文件包含描述文档结构和各种文档元素顺序的元数据,是一种包含后端提供的可访问标记,管理阅读顺序和文档内容表示的逻辑结构 ...
- MySQL数据库授权的两种方式
方法一:grant命令创建用户并授权(针对只修改权限) grant命令简单语法如下: grant all privileges on dbname.* to username@localhost id ...
- 跟我学Python图像处理丨基于灰度三维图的图像顶帽运算和黑帽运算
摘要:本篇文章结合灰度三维图像讲解图像顶帽运算和图像黑猫运算,通过Python调用OpenCV函数实现. 本文分享自华为云社区<[Python图像处理] 十三.基于灰度三维图的图像顶帽运算和黑帽 ...