其他说明参考host主机监控规则:https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html

在prometheus主程序目录下的rules目录下新建docker.yml文件,添加上如下内容,然后重启prometheus。

groups:
- name: Docker containers monitoring
rules:
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 5m
labels:
severity: warning
annotations:
summary: "Container killed (instance {{ $labels.instance }})"
description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container CPU usage (instance {{ $labels.instance }})"
description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Memory usage (instance {{ $labels.instance }})"
description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeUsage
expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume usage (instance {{ $labels.instance }})"
description: "Container Volume usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume IO usage (instance {{ $labels.instance }})"
description: "Container Volume IO usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Container high throttle rate (instance {{ $labels.instance }})"
description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerActiveConnectinos
expr: pgbouncer_pools_server_active_connections > 200
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"
description: "PGBouncer pools are filling up\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerErrors
expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer errors (instance {{ $labels.instance }})"
description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerMaxConnections
expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "PGBouncer max connections (instance {{ $labels.instance }})"
description: "The number of PGBouncer client connections has reached max_client_conn.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqQueueSize
expr: sidekiq_queue_size{} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Sidekiq queue size (instance {{ $labels.instance }})"
description: "Sidekiq queue {{ $labels.name }} is growing\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqSchedulingLatencyTooHigh
expr: max(sidekiq_queue_latency) > 120
for: 5m
labels:
severity: critical
annotations:
summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"
description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulServiceHealthcheckFailed
expr: consul_catalog_service_node_healthy == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"
description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulMissingMasterNode
expr: consul_raft_peers < 3
for: 5m
labels:
severity: critical
annotations:
summary: "Consul missing master node (instance {{ $labels.instance }})"
description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulAgentUnhealthy
expr: consul_health_node_status{status="critical"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Consul agent unhealthy (instance {{ $labels.instance }})"
description: "A Consul agent is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

cAdvisor容器监控规则的更多相关文章

  1. 容器监控:cadvisor+influxdb+grafana

    cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...

  2. docker stack 部署容器监控方案(cAdvisor、Prometheus、Grafana)

    =============================================== 2018/7/8_第1次修改                       ccb_warlock === ...

  3. Docker进阶-容器监控cAdvisor+InfluxDB+Granfana

    概述 前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...

  4. 你必须知道的容器监控 (2) cAdvisor

    本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...

  5. docker容器监控:cadvisor+influxdb+grafana

    cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...

  6. 容器监控:cAdvisor

    CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...

  7. 【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控

    本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程 三个都直接下 ...

  8. 你必须知道的容器监控 (3) Prometheus

    本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...

  9. Docker系列08:容器监控

    1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...

随机推荐

  1. ReentrantLock源码详解

    前言 以前只知道ReentrantLock底层基于AQS实现,相对于(旧版本的)synchronized: 更轻量(基于CAS而不是管程),由JDK实现 可以实现公平/非公平 可中断等待 可绑定多个条 ...

  2. Nginx第三方模块Ngx-dyups安装过程

    Ngx-dyups是什么,能干什么 它是一个Nginx第三方动态Upstream配置模块,可以实现在不重启Nginx情况下动态更新反向代理Upstream表.该模块由淘宝开发团队维护,淘宝自家的Ten ...

  3. Docker非root用户使用

    Docker 用户管理 安装Docker后docker相关命令都需要加上sudo才能执行,这里为特定用户添加下权限 Docker群组 不过一般安好docker后该群组已创建 sudo groupadd ...

  4. sql语句实现每日资格设置

    CREATE TABLE 'iDayAuth'( 'openid' VARCHAR(16) NOT NULL , 'iStamp' INT(10) NOT NULL, 'iDayAuth' SMALL ...

  5. 使用Python3.7+Tornado5.1集成新浪微博三方登录(无需企业资质)

    原文转载自「刘悦的技术博客」https://v3u.cn/a_id_137 新浪微博:山寨版的twitter,各种粉丝的集散地,天朝人民的最爱,基本上网民都人手一个微博账号,所以使用新浪微博账号进行三 ...

  6. GreatSQL特性介绍及未来展望--叶金荣|万里数据库

    「3306π」是由业内知名MySQL专家叶金荣.吴炳锡首发倡议成立,围绕MySQL及云数据库.大数据等周边相关技术的技术爱好者的社区.致力于把互联网技术带到传统行业里,推动开源技术在传统行业中应用.本 ...

  7. 多环境配置 - SpringBoot 2.7.2 实战基础

    优雅哥 SpringBoot 2.7.2 实战基础 - 06 -多环境配置 在一个项目的开发过程中,通常伴随着多套环境:本地环境 local.开发环境 dev.集成测试环境 test.用户接受测试环境 ...

  8. DS二叉树——二叉树之数组存储

    题目描述 二叉树可以采用数组的方法进行存储,把数组中的数据依次自上而下,自左至右存储到二叉树结点中,一般二叉树与完全二叉树对比,比完全二叉树缺少的结点就在数组中用0来表示.,如下图所示 从上图可以看出 ...

  9. java学习第二天面向对象.day07

    变量的生命周期 成员变量:存储在堆内存中,随着对象的销毁而销毁 局部变量:存储在栈内存中,随着所定义方法的调用结束而销毁 局部变量存储在方法中,每次调用方法都会在栈空间开辟一块内存空间--栈帧,方法调 ...

  10. React报错之React hook 'useState' is called conditionally

    正文从这开始~ 总览 当我们有条件地使用useState钩子时,或者在一个可能有返回值的条件之后,会产生"React hook 'useState' is called conditiona ...