监控prometheus
一、prometheus-webhook-daingtalak
github地址:[Releases · timonwong/prometheus-webhook-dingtalk · GitHub](https://github.com/timonwong/prometheus-webhook-dingtalk/releases)
下载地址:[](https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz)
自己去GitHub上下载需要的版本,然后解压:
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
tar xf prometheus-webhook-dingtalk-0.3..linux-amd64.tar.gz -C /data; cd /data
mv prometheus-webhook-dingtalk-0.3..linux-amd64 prometheus-webhook-dingtalk
修改配置文件:
# cat default.tmpl
{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }} {{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }}) {{ end }}{{ end }} {{ define "ding.link.title" }}{{ template "__subject" . }}{{ end }}
{{ define "ding.link.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ template "__text_alert_list" .Alerts.Firing }}
{{ end }}
启动服务:
# cat prometheus-webhook-dingtalk.sh
#!/bin/bash
nohup prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="test=https://oapi.dingtalk.com/robot/send?access_token=89f3cedfb3c3cdb031bdf10f8fc52bf1add575e9b5fb6f462a8cca6859af4" >>/data/prometheus-webhook-daingtalak/nohub.out >& &
--ding.profile是钉钉机器人生成的,自己创建个钉钉机器人。
二、Alertmanager
github地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)
下载地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)
自己去GitHub上下载需要的版本,然后解压:
wget https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz
tar xf alertmanager-0.15..linux-amd64.tar.gz -C /data ;cd /data
mv alertmanager-0.15..linux-amd64 alertmanager
修改配置文件,由于我自己使用的是钉钉告警,所以本文使用的钉钉:
# cat alertmanager.yml
global:
resolve_timeout: 5m route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'test'
receivers:
- name: 'test'
webhook_configs:
- url: "http://127.0.0.1:8060/dingtalk/test/send"
send_resolved: true
此处的url是prometheus-webhook-daingtalak的地址,用于将告警信息转换成钉钉可以接受的消息格式。
启动alertmanager:
# cat alertmanager.sh
#!/bin/bash
nohup alertmanager --config.file="/data/alertmanager/alertmanager.yml" --storage.path="/data/alertmanager/data" --web.listen-address="0.0.0.0:9093" >>/data/alertmanager/nohub.out >& &
alertmanager访问地址:
http://ip:9093
三、Prometheus
github地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
1、prometheus组成
1)prometheus:主程序,主要负责采集数据以及数据存储,并且对外提供PromQL实现监控数据的查询以及聚合分析;
2)*_exporter:于向Prometheus Server暴露数据采集的endpoint,Prometheus轮训这些Exporter采集并且保存数据;
3)alertManager: 负责实现告警,结合邮件或钉钉
4)pushgateway: Prometheus为一些临时存在的进程,如批处理任务,提供了Push Gateway,这些客户端可以将数据push到Push Gateway中,然后由Push Gateway提供pull接口将数据暴露给PrometheusServer。
5)prometheus主要通过pull的方式获取数据,这样就大大减少了被监控端的压力和系统资源的占用。
2、安装
下载地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
自己去GitHub上下载需要的版本,然后解压:
wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz
tar xf prometheus-2.3..linux-amd64.tar.gz -C /data ;cd /data
mv prometheus-2.3..linux-amd64 prometheus
然后修改配置文件,定义相应的监控项job:
# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every seconds. Default is every minute.
evaluation_interval: 15s # Evaluate rules every seconds. The default is every minute.
# scrape_timeout is set to the global default (10s).
#remote_write:
# - url: "http://10.2.79.208:9201/write"
#remote_read:
# - url: "http://10.2.79.208:9201/read"
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1: # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
- "/data/prometheus/mongodb-rules.yml"
- "/data/prometheus/consul-rules.yml"
- "/data/prometheus/redis-rules.yml"
- "/data/prometheus/nginx-rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus' # metrics_path defaults to '/metrics'
# scheme defaults to 'http'. static_configs:
- targets: ['localhost:9090']
- job_name: 'mongodb1'
static_configs:
- targets: ['10.10.8.70:9218']
- job_name: 'mongodb1-system'
static_configs:
- targets: ['10.10.8.70:9100'] - job_name: 'mongodb2'
static_configs:
- targets: ['10.10.5.108:9218']
rule_files:指定告警规则文件的路径,可以定义自己的告警规则
# cat consul-rules.yml
---
groups:
- name: consul
rules:
- alert: consul_catalog_service_node_healthy
expr: consul_catalog_service_node_healthy <
for: 60s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.node }} {{ $labels.service_id }} is Unhealth'
summary: 'some service is unhealth,you must chek it out by consul' - alert: consul_node_health
expr: consul_exporter_build_info <
for: 60s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.instance }} consul server is down '
summary: 'consul server is down' - alert: consul_health_service_status
expr: consul_health_service_status <
for: 60s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.node }} {{ $labels.service_id }} is Unhealth'
summary: 'some service is unhealth,you must chek it out by consul'
# cat mongodb-rules.yml
---
groups:
- name: mongodb
rules:
- alert: mongodb_mongod_connections
expr: mongodb_mongod_connections{state='current'} and mongodb_mongod_connections <
for: 10s
labels:
serverity: critical
annotations:
description: '{{ $labels.instance }} of {{ $labels.job }} connections is low 11'
summary: 'connections is too Low,Mongodb mybe is Down!' - alert: mongodb_mongod_connections
expr: mongodb_mongod_connections{state='current'} and mongodb_mongod_connections >
for: 10s
labels:
serverity: warning
annotations:
description: '{{ $labels.instance }} of {{ $labels.job }} connections is high 570'
summary: 'connections is too much' - alert: mongodb_mongod_memory
expr: mongodb_mongod_memory{type='virtual'} and mongodb_mongod_memory <
for: 5s
labels:
serverity: critical
annotations:
description: '{{ $labels.instance }} of {{ $labels.job }} {{ $labels.type }} is too low'
summary: 'mongodb mybe is down' - alert: mongodb_mongod_replset_member_health
expr: mongodb_mongod_replset_member_health !=
for: 5s
labels:
serverity: critical
annotations:
description: ' {{ $labels.name }} {{ $labels.state}} is down'
summary: 'one of replsets node is down' - alert: mongodb_mongod_replset_my_state
expr: mongodb_mongod_replset_my_state{job='mongodb3'} and mongodb_mongod_replset_my_state !=
for: 5s
labels:
serverity: critical
annotations:
description: ' replsets master have been changed, {{ $labels.job }} is not master'
summary: 'mongodb3 master is down,chek the status'
#cat redis-rules.yml
---
groups:
- name: redis
rules:
- alert: redis_instantaneous_ops_per_sec
expr: redis_instantaneous_ops_per_sec <
for: 120s
labels:
serverity: critical
annotations:
descrition: '{{ $labels.job }} is Unhealth'
summary: 'redis-prod options/sec is too low,redis maybe traffic jam ,you must check it out by "redis-cli slowlog get"'
#cat nginx-rules.yml
---
groups:
- name: nginx-exporter
rules:
- alert: status_code_499
expr: status_code_499 >
for: 60s
labels:
serverity: critical
annotations:
descrition: ' status_code_499:{{ status_code_499 }}'
summary: 'nginx status code 499 is too much,check loadbalance /var/log/nginx/share.log' - alert: status_code_400
expr: status_code_400 >
for: 60s
labels:
serverity: critical
annotations:
descrition: 'status_code_400: {{ status_code_400 }}'
summary: 'nginx status code 400 is too much,check loadbalance /var/log/nginx/share.log'
nginx是我自己写的一个exportor,地址:https://github.com/cuishuaigit/nginx_exporter
启动:
# cat prometheus.sh
#!/bin/bash
nohup prometheus --config.file="/data/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090" --storage.tsdb.path="/data/prometheus/data" --web.console.libraries="/data/prometheus/console_libraries" --web.console.templates="/data/prometheus/consoles" --web.enable-admin-api --log.level=info >>/data/prometheus/nohub.out >& &
prometheus_ui访问:
http://ip:9090
四、exporter
1、https://github.com/prometheus/node_exporter
2、https://github.com/prometheus/influxdb_exporter
3、https://github.com/prometheus/mysqld_exporter
4、https://github.com/prometheus/jmx_exporter
5、https://github.com/prometheus/consul_exporter
6、https://github.com/prometheus/haproxy_exporter
监控prometheus的更多相关文章
- kubernetes监控-prometheus(十六)
监控方案 cAdvisor+Heapster+InfluxDB+Grafana Y 简单 容器监控 cAdvisor/exporter+Prometheus+Grafana Y 扩展性好 容器,应用, ...
- kubernetes监控prometheus配置项解读
前言 文中解决两个问题: 1. kubernetes官方推荐的监控 prometheus 的配置文件, 各项是什么含义 2. 配置好面板之后, 如换去配置 grafana 面板 当然这两个问题网上都有 ...
- Docker 监控- Prometheus VS Cloud Insight
如今,越来越多的公司开始使用 Docker 了,2 / 3 的公司在尝试了 Docker 后最终使用了它.为了能够更精确的分配每个容器能使用的资源,我们想要实时获取容器运行时使用资源的情况,怎样对 D ...
- kubernetes之监控Prometheus实战--prometheus介绍--获取监控(一)
Prometheus介绍 Prometheus是一个最初在SoundCloud上构建的开源监控系统 .它现在是一个独立的开源项目,为了强调这一点,并说明项目的治理结构,Prometheus 于2016 ...
- centos7下安装docker(17.4docker监控----prometheus)
Prometheus是一个非常优秀的监控工具.准确的说,应该是监控方案.Prometheus提供了监控数据搜集,存储,处理,可视化和告警一套完整的解决方案 Prometheus架构如盗图: 官网上的原 ...
- kubernetes监控--Prometheus
本文基于kubernetes 1.5.2版本编写 kube-state-metrics kubectl create ns monitoring kubectl create sa -n monito ...
- K8S的Kafka监控(Prometheus+Grafana)
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
- Longhorn,企业级云原生容器分布式存储 - 监控(Prometheus+AlertManager+Grafana)
内容来源于官方 Longhorn 1.1.2 英文技术手册. 系列 Longhorn 是什么? Longhorn 企业级云原生容器分布式存储解决方案设计架构和概念 Longhorn 企业级云原生容器分 ...
- k8s全方位监控 -prometheus实现短信告警接口编写(python)
1.prometheus短信告警接口实现(python)源码如下: import subprocess from flask import Flask from flask import reques ...
随机推荐
- STL容器之一vector
STL中最简单也是最有用的容器之一是vector<T>类模板,称为向量容器,是序列类型容器中的一种. 1.vector<T> 对象的基本用法(1)声明:vector<ty ...
- hdu4048
题意:给定m个数,还有n,n表示有一个长度为n的环,现在要求从M个数中选出若干个数,要求选出的数最大公约数为1,填充在n个位置中,选出的数可以重复,求多少种种方案.旋转当成一样的 . 思路:假设现在选 ...
- 关于能量场和力场弯曲空间的实验证明 EXPERIMENTAL PROOF ON THE BENDING SPACE OF ENERGY FIELD AND FORCE FIELD
前文提到,F = ma, E = mc^2,并且等效是传递的,等效概念具有同属性.所以不止能量,力场也可以弯曲空间. 实验:某人被头顶上方的电站10几万伏的设备吸收,烧毁双臂. (虽然这个实验不应具有 ...
- 基于opencv3.0下的人脸检测和检测部分的高斯模糊处理
如题 这里将任务分解为三大部分: 1.录播放视频 2.人脸检测 3.部分高斯模糊 其中重点放在人脸检测和部分高斯模糊上 1.录播放视频(以opencv中的VideoCapture类进行实现) 首先罗列 ...
- WPF 使用 Direct2D1 画图 绘制基本图形
本文来告诉大家如何在 Direct2D1 绘制基本图形,包括线段.矩形.椭圆 本文是一个系列 WPF 使用 Direct2D1 画图入门 WPF 使用 Direct2D1 画图 绘制基本图形 本文的组 ...
- BZOJ NOIP提高组十连测第一场
今天的题目一共拿了$180$分,感觉自己还是太菜了,二三两题只能骗到部分分 1.$String\ Master$ 题目大意:有两个字符串,在允许k次失配的情况下,求最长公共子串的长度 没什么好讲,直接 ...
- HDU4825 Xor Sum (01Trie)
Problem Description Zeus 和 Prometheus 做了一个游戏,Prometheus 给 Zeus 一个集合,集合中包含了N个正整数,随后 Prometheus 将向 Zeu ...
- 解放双手 | Jenkins + gitlab + maven 自动打包部署项目
前言 记录 Jenkins + gitlab + maven 自动打包部署后端项目详细过程! 需求背景 不会偷懒的程序员不是好码农,传统的项目部署,有时候采用本地手动打包,再通过ssh传到服务器部署运 ...
- flask开发微信公众号
1.进入微信公众号首页,进行注册登录 https://mp.weixin.qq.com/ 2.进入个人首页,进行公众号设置 可参照 公众号文档 进行开发 开发前 先阅读 接口权限列表 3.配置服务器 ...
- copy代码的时候,如何去掉代码前边的编号
从网页上拷贝下来的代码前面总有编号,如何去掉! 1.使用正则表达式:在editorplus(notepad++)里按ctrl+h,弹出框里勾选上“正则表达式(regular expression)”, ...