一、prometheus-webhook-daingtalak

github地址:[Releases · timonwong/prometheus-webhook-dingtalk · GitHub](https://github.com/timonwong/prometheus-webhook-dingtalk/releases)
下载地址:[](https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz)

自己去GitHub上下载需要的版本,然后解压:

wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
tar xf prometheus-webhook-dingtalk-0.3..linux-amd64.tar.gz -C /data; cd /data
mv prometheus-webhook-dingtalk-0.3..linux-amd64 prometheus-webhook-dingtalk

修改配置文件:
# cat default.tmpl

{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }} {{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }}) {{ end }}{{ end }} {{ define "ding.link.title" }}{{ template "__subject" . }}{{ end }}
{{ define "ding.link.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ template "__text_alert_list" .Alerts.Firing }}
{{ end }}

启动服务:
# cat prometheus-webhook-dingtalk.sh

#!/bin/bash
nohup prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="test=https://oapi.dingtalk.com/robot/send?access_token=89f3cedfb3c3cdb031bdf10f8fc52bf1add575e9b5fb6f462a8cca6859af4" >>/data/prometheus-webhook-daingtalak/nohub.out >& &

--ding.profile是钉钉机器人生成的,自己创建个钉钉机器人。

二、Alertmanager
github地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)

下载地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)

自己去GitHub上下载需要的版本,然后解压:

wget https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz
tar xf alertmanager-0.15..linux-amd64.tar.gz -C /data ;cd /data
mv alertmanager-0.15..linux-amd64 alertmanager

修改配置文件,由于我自己使用的是钉钉告警,所以本文使用的钉钉:
# cat alertmanager.yml

global:
resolve_timeout: 5m route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'test'
receivers:
- name: 'test'
webhook_configs:
- url: "http://127.0.0.1:8060/dingtalk/test/send"
send_resolved: true

此处的url是prometheus-webhook-daingtalak的地址,用于将告警信息转换成钉钉可以接受的消息格式。

启动alertmanager:
# cat alertmanager.sh

#!/bin/bash
nohup alertmanager --config.file="/data/alertmanager/alertmanager.yml" --storage.path="/data/alertmanager/data" --web.listen-address="0.0.0.0:9093" >>/data/alertmanager/nohub.out >& &

alertmanager访问地址:
http://ip:9093

三、Prometheus

github地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)

1、prometheus组成
1)prometheus:主程序,主要负责采集数据以及数据存储,并且对外提供PromQL实现监控数据的查询以及聚合分析;
2)*_exporter:于向Prometheus Server暴露数据采集的endpoint,Prometheus轮训这些Exporter采集并且保存数据;
3)alertManager: 负责实现告警,结合邮件或钉钉
4)pushgateway: Prometheus为一些临时存在的进程,如批处理任务,提供了Push Gateway,这些客户端可以将数据push到Push Gateway中,然后由Push Gateway提供pull接口将数据暴露给PrometheusServer。

5)prometheus主要通过pull的方式获取数据,这样就大大减少了被监控端的压力和系统资源的占用。

2、安装
下载地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
自己去GitHub上下载需要的版本,然后解压:

wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz
tar xf prometheus-2.3..linux-amd64.tar.gz -C /data ;cd /data
mv prometheus-2.3..linux-amd64 prometheus

然后修改配置文件,定义相应的监控项job:
# cat prometheus.yml

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every seconds. Default is every minute.
evaluation_interval: 15s # Evaluate rules every seconds. The default is every minute.
# scrape_timeout is set to the global default (10s).
#remote_write:
# - url: "http://10.2.79.208:9201/write"
#remote_read:
# - url: "http://10.2.79.208:9201/read"
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1: # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
- "/data/prometheus/mongodb-rules.yml"
- "/data/prometheus/consul-rules.yml"
- "/data/prometheus/redis-rules.yml"
- "/data/prometheus/nginx-rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus' # metrics_path defaults to '/metrics'
# scheme defaults to 'http'. static_configs:
- targets: ['localhost:9090']
- job_name: 'mongodb1'
static_configs:
- targets: ['10.10.8.70:9218']
- job_name: 'mongodb1-system'
static_configs:
- targets: ['10.10.8.70:9100'] - job_name: 'mongodb2'
static_configs:
- targets: ['10.10.5.108:9218']

rule_files:指定告警规则文件的路径,可以定义自己的告警规则

# cat consul-rules.yml

---
groups:
- name: consul
rules:
- alert: consul_catalog_service_node_healthy
expr: consul_catalog_service_node_healthy <
for: 60s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.node }} {{ $labels.service_id }} is Unhealth'
summary: 'some service is unhealth,you must chek it out by consul' - alert: consul_node_health
expr: consul_exporter_build_info <
for: 60s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.instance }} consul server is down '
summary: 'consul server is down' - alert: consul_health_service_status
expr: consul_health_service_status <
for: 60s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.node }} {{ $labels.service_id }} is Unhealth'
summary: 'some service is unhealth,you must chek it out by consul'

# cat mongodb-rules.yml

---
groups:
- name: mongodb
rules:
- alert: mongodb_mongod_connections
expr: mongodb_mongod_connections{state='current'} and mongodb_mongod_connections <
for: 10s
labels:
serverity: critical
annotations:
description: '{{ $labels.instance }} of {{ $labels.job }} connections is low 11'
summary: 'connections is too Low,Mongodb mybe is Down!' - alert: mongodb_mongod_connections
expr: mongodb_mongod_connections{state='current'} and mongodb_mongod_connections >
for: 10s
labels:
serverity: warning
annotations:
description: '{{ $labels.instance }} of {{ $labels.job }} connections is high 570'
summary: 'connections is too much' - alert: mongodb_mongod_memory
expr: mongodb_mongod_memory{type='virtual'} and mongodb_mongod_memory <
for: 5s
labels:
serverity: critical
annotations:
description: '{{ $labels.instance }} of {{ $labels.job }} {{ $labels.type }} is too low'
summary: 'mongodb mybe is down' - alert: mongodb_mongod_replset_member_health
expr: mongodb_mongod_replset_member_health !=
for: 5s
labels:
serverity: critical
annotations:
description: ' {{ $labels.name }} {{ $labels.state}} is down'
summary: 'one of replsets node is down' - alert: mongodb_mongod_replset_my_state
expr: mongodb_mongod_replset_my_state{job='mongodb3'} and mongodb_mongod_replset_my_state !=
for: 5s
labels:
serverity: critical
annotations:
description: ' replsets master have been changed, {{ $labels.job }} is not master'
summary: 'mongodb3 master is down,chek the status'

#cat redis-rules.yml

---
groups:
- name: redis
rules:
- alert: redis_instantaneous_ops_per_sec
expr: redis_instantaneous_ops_per_sec <
for: 120s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.job }} is Unhealth'
summary: 'redis-prod options/sec is too low,redis maybe traffic jam ,you must check it out by "redis-cli slowlog get"'

#cat nginx-rules.yml

---
groups:
- name: nginx-exporter
rules:
- alert: status_code_499
expr: status_code_499 >
for: 60s
labels:
serverity: critical
annotations:
descrition: ' status_code_499:{{ status_code_499 }}'
summary: 'nginx status code 499 is too much,check loadbalance /var/log/nginx/share.log' - alert: status_code_400
expr: status_code_400 >
for: 60s
labels:
serverity: critical
annotations:
descrition: 'status_code_400: {{ status_code_400 }}'
summary: 'nginx status code 400 is too much,check loadbalance /var/log/nginx/share.log'

nginx是我自己写的一个exportor,地址:https://github.com/cuishuaigit/nginx_exporter

启动:
# cat prometheus.sh

#!/bin/bash
nohup prometheus --config.file="/data/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090" --storage.tsdb.path="/data/prometheus/data" --web.console.libraries="/data/prometheus/console_libraries" --web.console.templates="/data/prometheus/consoles" --web.enable-admin-api --log.level=info >>/data/prometheus/nohub.out >& &

prometheus_ui访问:
http://ip:9090

四、exporter

1、https://github.com/prometheus/node_exporter

2、https://github.com/prometheus/influxdb_exporter

3、https://github.com/prometheus/mysqld_exporter

4、https://github.com/prometheus/jmx_exporter

5、https://github.com/prometheus/consul_exporter

6、https://github.com/prometheus/haproxy_exporter

监控prometheus的更多相关文章

  1. kubernetes监控-prometheus(十六)

    监控方案 cAdvisor+Heapster+InfluxDB+Grafana Y 简单 容器监控 cAdvisor/exporter+Prometheus+Grafana Y 扩展性好 容器,应用, ...

  2. kubernetes监控prometheus配置项解读

    前言 文中解决两个问题: 1. kubernetes官方推荐的监控 prometheus 的配置文件, 各项是什么含义 2. 配置好面板之后, 如换去配置 grafana 面板 当然这两个问题网上都有 ...

  3. Docker 监控- Prometheus VS Cloud Insight

    如今,越来越多的公司开始使用 Docker 了,2 / 3 的公司在尝试了 Docker 后最终使用了它.为了能够更精确的分配每个容器能使用的资源,我们想要实时获取容器运行时使用资源的情况,怎样对 D ...

  4. kubernetes之监控Prometheus实战--prometheus介绍--获取监控(一)

    Prometheus介绍 Prometheus是一个最初在SoundCloud上构建的开源监控系统 .它现在是一个独立的开源项目,为了强调这一点,并说明项目的治理结构,Prometheus 于2016 ...

  5. centos7下安装docker(17.4docker监控----prometheus)

    Prometheus是一个非常优秀的监控工具.准确的说,应该是监控方案.Prometheus提供了监控数据搜集,存储,处理,可视化和告警一套完整的解决方案 Prometheus架构如盗图: 官网上的原 ...

  6. kubernetes监控--Prometheus

    本文基于kubernetes 1.5.2版本编写 kube-state-metrics kubectl create ns monitoring kubectl create sa -n monito ...

  7. K8S的Kafka监控(Prometheus+Grafana)

    欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...

  8. Longhorn,企业级云原生容器分布式存储 - 监控(Prometheus+AlertManager+Grafana)

    内容来源于官方 Longhorn 1.1.2 英文技术手册. 系列 Longhorn 是什么? Longhorn 企业级云原生容器分布式存储解决方案设计架构和概念 Longhorn 企业级云原生容器分 ...

  9. k8s全方位监控 -prometheus实现短信告警接口编写(python)

    1.prometheus短信告警接口实现(python)源码如下: import subprocess from flask import Flask from flask import reques ...

随机推荐

  1. STL容器之一vector

    STL中最简单也是最有用的容器之一是vector<T>类模板,称为向量容器,是序列类型容器中的一种. 1.vector<T> 对象的基本用法(1)声明:vector<ty ...

  2. hdu4048

    题意:给定m个数,还有n,n表示有一个长度为n的环,现在要求从M个数中选出若干个数,要求选出的数最大公约数为1,填充在n个位置中,选出的数可以重复,求多少种种方案.旋转当成一样的 . 思路:假设现在选 ...

  3. 关于能量场和力场弯曲空间的实验证明 EXPERIMENTAL PROOF ON THE BENDING SPACE OF ENERGY FIELD AND FORCE FIELD

    前文提到,F = ma, E = mc^2,并且等效是传递的,等效概念具有同属性.所以不止能量,力场也可以弯曲空间. 实验:某人被头顶上方的电站10几万伏的设备吸收,烧毁双臂. (虽然这个实验不应具有 ...

  4. 基于opencv3.0下的人脸检测和检测部分的高斯模糊处理

    如题 这里将任务分解为三大部分: 1.录播放视频 2.人脸检测 3.部分高斯模糊 其中重点放在人脸检测和部分高斯模糊上 1.录播放视频(以opencv中的VideoCapture类进行实现) 首先罗列 ...

  5. WPF 使用 Direct2D1 画图 绘制基本图形

    本文来告诉大家如何在 Direct2D1 绘制基本图形,包括线段.矩形.椭圆 本文是一个系列 WPF 使用 Direct2D1 画图入门 WPF 使用 Direct2D1 画图 绘制基本图形 本文的组 ...

  6. BZOJ NOIP提高组十连测第一场

    今天的题目一共拿了$180$分,感觉自己还是太菜了,二三两题只能骗到部分分 1.$String\ Master$ 题目大意:有两个字符串,在允许k次失配的情况下,求最长公共子串的长度 没什么好讲,直接 ...

  7. HDU4825 Xor Sum (01Trie)

    Problem Description Zeus 和 Prometheus 做了一个游戏,Prometheus 给 Zeus 一个集合,集合中包含了N个正整数,随后 Prometheus 将向 Zeu ...

  8. 解放双手 | Jenkins + gitlab + maven 自动打包部署项目

    前言 记录 Jenkins + gitlab + maven 自动打包部署后端项目详细过程! 需求背景 不会偷懒的程序员不是好码农,传统的项目部署,有时候采用本地手动打包,再通过ssh传到服务器部署运 ...

  9. flask开发微信公众号

    1.进入微信公众号首页,进行注册登录 https://mp.weixin.qq.com/ 2.进入个人首页,进行公众号设置 可参照 公众号文档 进行开发 开发前 先阅读 接口权限列表 3.配置服务器 ...

  10. copy代码的时候,如何去掉代码前边的编号

    从网页上拷贝下来的代码前面总有编号,如何去掉! 1.使用正则表达式:在editorplus(notepad++)里按ctrl+h,弹出框里勾选上“正则表达式(regular expression)”, ...