其他说明参考host主机监控规则:https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html

在prometheus主程序目录下的rules目录下新建docker.yml文件,添加上如下内容,然后重启prometheus。

groups:
- name: Docker containers monitoring
rules:
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 5m
labels:
severity: warning
annotations:
summary: "Container killed (instance {{ $labels.instance }})"
description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container CPU usage (instance {{ $labels.instance }})"
description: "Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Memory usage (instance {{ $labels.instance }})"
description: "Container Memory usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeUsage
expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume usage (instance {{ $labels.instance }})"
description: "Container Volume usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container Volume IO usage (instance {{ $labels.instance }})"
description: "Container Volume IO usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Container high throttle rate (instance {{ $labels.instance }})"
description: "Container is being throttled\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerActiveConnectinos
expr: pgbouncer_pools_server_active_connections > 200
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"
description: "PGBouncer pools are filling up\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerErrors
expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "PGBouncer errors (instance {{ $labels.instance }})"
description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: PgbouncerMaxConnections
expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "PGBouncer max connections (instance {{ $labels.instance }})"
description: "The number of PGBouncer client connections has reached max_client_conn.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqQueueSize
expr: sidekiq_queue_size{} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Sidekiq queue size (instance {{ $labels.instance }})"
description: "Sidekiq queue {{ $labels.name }} is growing\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SidekiqSchedulingLatencyTooHigh
expr: max(sidekiq_queue_latency) > 120
for: 5m
labels:
severity: critical
annotations:
summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"
description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulServiceHealthcheckFailed
expr: consul_catalog_service_node_healthy == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"
description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulMissingMasterNode
expr: consul_raft_peers < 3
for: 5m
labels:
severity: critical
annotations:
summary: "Consul missing master node (instance {{ $labels.instance }})"
description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: ConsulAgentUnhealthy
expr: consul_health_node_status{status="critical"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Consul agent unhealthy (instance {{ $labels.instance }})"
description: "A Consul agent is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

cAdvisor容器监控规则的更多相关文章

  1. 容器监控:cadvisor+influxdb+grafana

    cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...

  2. docker stack 部署容器监控方案(cAdvisor、Prometheus、Grafana)

    =============================================== 2018/7/8_第1次修改                       ccb_warlock === ...

  3. Docker进阶-容器监控cAdvisor+InfluxDB+Granfana

    概述 前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...

  4. 你必须知道的容器监控 (2) cAdvisor

    本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...

  5. docker容器监控:cadvisor+influxdb+grafana

    cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...

  6. 容器监控:cAdvisor

    CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...

  7. 【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控

    本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程 三个都直接下 ...

  8. 你必须知道的容器监控 (3) Prometheus

    本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...

  9. Docker系列08:容器监控

    1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...

随机推荐

  1. C++数据类型的引入

    1.存储位数 计算机管理存储器(内存和外存)的最小单位是字节,每个字节存储一个8为二进制数.一个字节的存储范围就在(00000000 ~ 11111111),十进制表示就是0~255这个范围.为了方便 ...

  2. Nginx工作模式

    Master-Worker模式 1.Nginx 在启动后,会有一个 master 进程和多个相互独立的 worker 进程.2.接收来自外界的信号,向各worker进程发送信号,每个进程都有可能来处理 ...

  3. P2512 【一本通提高篇贪心】「一本通 1.1 练习 6」[HAOI2008]糖果传递

    [HAOI2008]糖果传递 题目描述 有 n n n 个小朋友坐成一圈,每人有 a i a_i ai​ 个糖果.每人只能给左右两人传递糖果.每人每次传递一个糖果代价为 1 1 1. 输入格式 小朋友 ...

  4. DP问题大合集

    引入 动态规划(Dynamic Programming,DP,动规),是求解决策过程最优化的过程.20世纪50年代初,美国数学家贝尔曼(R.Bellman)等人在研究多阶段决策过程的优化问题时,提出了 ...

  5. 《ABP Framework 极速开发》教程首发

    写在发布之前 有没有小伙伴跟我刚开始接触 ABP Framework 的感觉一样"一看文档深似海",看完文档之后,想要上手却找不着头绪. 本套教程写作的目的之一是为初学者提供一条相 ...

  6. RestTemplate上传文件

    1.上传的文件是File类型 如果文件保存在本地,即可以通过File file = new File(path) 或者 文件路径地址获取到指定文件 public String uploadFile(F ...

  7. MySQL数据库的创建和基本的查询语句

    数据库的定义 数据库是按照数据结构来组织.存储和管理数据的建立在计算机存储设备上的仓库 分类 非结构化数据: 数据相对来说没有固定的特点 半结构化数据: 数据之间有着相同的存储结构 属性 值 每一条数 ...

  8. linux centos 系统盘文件系统损坏-已解决

    当我们使用的Linux虚拟机(云服务器/vps)磁盘出现xfs文件系统损坏时,该如何进行修复? xfs格式文件系统损坏,是运维常见的一个场景,经常发生在强制重启.异常关机.软件冲突.误删文件等事件后, ...

  9. Vue3 发生错误:setup function returned a promise

    当你组件中有 Promise 对象时,即 Axios.Ajax 这类的请求,然后把数据渲染到模板中就会报如下图的错误: 在这个异步组件外包裹一个 <Suspense> 组件.比如 App. ...

  10. RabbitMQ 入门系列:3、基础含义:持久化、排它性、自动删除、强制性、路由键。

    系列目录 RabbitMQ 入门系列:1.MQ的应用场景的选择与RabbitMQ安装. RabbitMQ 入门系列:2.基础含义:链接.通道.队列.交换机. RabbitMQ 入门系列:3.基础含义: ...