cAdvisor容器监控规则

其他说明参考host主机监控规则：https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html

在prometheus主程序目录下的rules目录下新建docker.yml文件，添加上如下内容，然后重启prometheus。

groups:

- name:  Docker containers monitoring

  rules:

  - alert: ContainerKilled

    expr: time() - container_last_seen > 60

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container killed (instance {{ $labels.instance }})"

      description: "A container has disappeared\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerCpuUsage

    expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container CPU usage (instance {{ $labels.instance }})"

      description: "Container CPU usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerMemoryUsage

    expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container Memory usage (instance {{ $labels.instance }})"

      description: "Container Memory usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerVolumeUsage

    expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container Volume usage (instance {{ $labels.instance }})"

      description: "Container Volume usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerVolumeIoUsage

    expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container Volume IO usage (instance {{ $labels.instance }})"

      description: "Container Volume IO usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerHighThrottleRate

    expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container high throttle rate (instance {{ $labels.instance }})"

      description: "Container is being throttled\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: PgbouncerActiveConnectinos

    expr: pgbouncer_pools_server_active_connections > 200

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"

      description: "PGBouncer pools are filling up\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: PgbouncerErrors

    expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "PGBouncer errors (instance {{ $labels.instance }})"

      description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: PgbouncerMaxConnections

    expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "PGBouncer max connections (instance {{ $labels.instance }})"

      description: "The number of PGBouncer client connections has reached max_client_conn.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: SidekiqQueueSize

    expr: sidekiq_queue_size{} > 100

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Sidekiq queue size (instance {{ $labels.instance }})"

      description: "Sidekiq queue {{ $labels.name }} is growing\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: SidekiqSchedulingLatencyTooHigh

    expr: max(sidekiq_queue_latency) > 120

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"

      description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ConsulServiceHealthcheckFailed

    expr: consul_catalog_service_node_healthy == 0

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"

      description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ConsulMissingMasterNode

    expr: consul_raft_peers < 3

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Consul missing master node (instance {{ $labels.instance }})"

      description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ConsulAgentUnhealthy

    expr: consul_health_node_status{status="critical"} == 1

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Consul agent unhealthy (instance {{ $labels.instance }})"

      description: "A Consul agent is down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

cAdvisor容器监控规则的更多相关文章

容器监控：cadvisor+influxdb+grafana
cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...
docker stack 部署容器监控方案（cAdvisor、Prometheus、Grafana）
=============================================== 2018/7/8_第1次修改 ccb_warlock === ...
Docker进阶-容器监控cAdvisor+InfluxDB+Granfana
概述前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...
你必须知道的容器监控 (2) cAdvisor
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...
docker容器监控：cadvisor+influxdb+grafana
cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...
容器监控：cAdvisor
CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...
【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控
本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程三个都直接下 ...
你必须知道的容器监控 (3) Prometheus
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...
Docker系列08：容器监控
1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...

随机推荐

wcf连接数据库用sqlhelper，连接数一直没有释放反而增加
找了一天,发现原因是配置的连接字符串没有加上最大连接数,所以每次请求都是一直增加,而MariaDB默认的连接数是151,为了本地多项目测试已改成以前. 下面是配置的连接字符串: <add na ...
C++系统函数
C++语言预先编写了很多常用函数提供给广大程序员使用,这些函数被统称为系统函数.C++语言全盘继承了C语言的标准C库,另外又增加了一些新的库(更多的是系统类库),这些新库被统称为C++标准库. 一.C ...
angr原理与实践（一）——原理
1本文系原创,转载请说明出处关注微信公众号信安科研人,获取更多的原创安全资讯编辑网上已经有很多介绍angr的官方文档的博客,但是怎么去用angr做一次有意义且成就感满满的分析的教程很少 ...
Solution -「HNOI2013」消毒
弱化一下,先考虑在二维上解决问题. 题目就转化为:有 \(n\) 个点 \((i, j)\) 需要被覆盖,而我们每次可以选一行或一列去覆盖,求覆盖所有点的最少选择次数. 如果我们对于每一个 \((i, ...
性能浪费的日志案例和使用Lambda优化日志案例
有些场景的代码执行后,结果不一定会被使用,从而造成性能浪费.而Lambda表达式是延迟执行的,这正好可以作为解决方案,提升性能性能浪费的日志案例日志可以帮助我们快速的定位问题,记录程序运行过程中的 ...
关于又拍云免费cdn全网加速服务的长期评测(各种踩坑)
原文转载自「刘悦的技术博客」 ( https://v3u.cn/a_id_128 ) 妇孺皆知,前端优化中最重要的优化手段之一就是cdn加速,所谓cdn加速就是采用更多的缓存服务器(CDN边缘节点), ...
发布Android库至MavenCentral详解
Sonatype 账号 MavenCentral 和 Sonatype 的关系库平台运营商管理后台 MavenCentral Sonatype oss.sonatype.org 因此我们要发布L ...
FPGA/Verilog 资源整理
verilog学习教程(以Vivado为载体)https://vlab.ustc.edu.cn/guide/index.html 中科大的数电实验网站https://vlab.ustc.edu.cn/
cmake错误的解决
安装Mysql时出现:CMake Error: Error executing cmake:: LoadCache(). Aborting. 对比:/usr/bin/cmake 和 /usr/loca ...
ceph 008 ceph多区域网关(ceph对象容灾) cephfs文件系统
clienta作为集群的管理人员.一部分.他是需要秘钥与配置文件的但真正服务端只需要通过curl就好 ceph 多区域网关对象存储容灾解决方案 zone与zone会做数据同步. 把会做同步的rgw ...

cAdvisor容器监控规则

cAdvisor容器监控规则的更多相关文章

随机推荐

热门专题