思路一

统一区域的监控目标,prometheus server两台监控相同的目标群体。

改变后

上面这个变化对于监控目标端,会多出一倍的查询请求,但在一台prometheus server宕机的情况下,可以不影响监控。

思路二

这是一个金字塔式的层次结构,而不是分布式层次结构。Prometheus 的抓取请求也会加载到prometheus work节点上,这是需要考虑的。

上面这种模式,准备3台prometheus server进行搭建,这种方式work节点一台宕机后,其它wokr节点不会去接手故障work节点的机器。

1、环境准备

192.168.31.151(primary)

192.168.31.144 (worker)

192.168.31.82(worker)

2、部署prometheus

cd /usr/loacl
tar -xvf prometheus-2.8.0.linux-amd64.tar.gz
ln -s /usr/local/prometheus-2.8.0.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus;mkdir bin conf data
mv ./promtool bin
mv ./prometheus bin
mv ./prometheus.yml conf

3、worker节点配置(192.168.31.144)

prometheus.yml

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
worker: 0 # Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets:
- 192.168.31.151:9090
- 192.168.31.144:9090
- 192.168.31.82:9090
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep
- job_name: 'node_exporter'
file_sd_configs:
- files:
- targets/nodes/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep
- job_name: 'docker'
file_sd_configs:
- files:
- targets/docker/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep
- job_name: 'alertmanager'
static_configs:
- targets:
- 192.168.31.151:9093
- 192.168.31.144:9093
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep

worker节点配置(192.168.31.82)

prometheus.yml

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
worker: 1 # Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets:
- 192.168.31.151:9090
- 192.168.31.144:9090
- 192.168.31.82:9090
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep
- job_name: 'node_exporter'
file_sd_configs:
- files:
- targets/nodes/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep
- job_name: 'docker'
file_sd_configs:
- files:
- targets/docker/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep
- job_name: 'alertmanager'
static_configs:
- targets:
- 192.168.31.151:9093
- 192.168.31.144:9093
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep

primary节点配置(192.168.31.151)

prometheus.yml

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s). # Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.31.151:9093
- 192.168.31.144:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*_alerts.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'node_workers'
file_sd_configs:
- files:
- 'targets/workers/*.json'
refresh_interval: 5m
honor_labels: true
metrics_path: /federate
params:
'match[]':
- '{__name__=~"^instance:.*"}'

cat ./targets/workers/workers.json

[{
"targets": [
"192.168.31.144:9090",
"192.168.31.82:9090"
]
}]

prometheus 集群的更多相关文章

  1. Prometheus集群介绍-1

    Prometheus监控介绍 公司做教育的,要迁移上云,所以需要我这边从零开始调研加后期维护Prometheus:近期看过二本方面的prometheus书籍,一本是深入浅出一般是实战方向的:官方文档主 ...

  2. Thanos prometheus 集群以及多租户解决方案docker-compose 试用(一)

    prometheus 是一个非常不多的metrics 监控解决方案,但是对于ha 以及多租户的处理并不是很好,当前有好多解决方案 cortex Thanos prometheus+ influxdb ...

  3. 部署prometheus监控kubernetes集群并存储到ceph

    简介 Prometheus 最初是 SoundCloud 构建的开源系统监控和报警工具,是一个独立的开源项目,于2016年加入了 CNCF 基金会,作为继 Kubernetes 之后的第二个托管项目. ...

  4. 如何扩展单个Prometheus实现近万Kubernetes集群监控?

    引言 TKE团队负责公有云,私有云场景下近万个集群,数百万核节点的运维管理工作.为了监控规模如此庞大的集群联邦,TKE团队在原生Prometheus的基础上进行了大量探索与改进,研发出一套可扩展,高可 ...

  5. 如何用Prometheus监控十万container的Kubernetes集群

    概述 不久前,我们在文章<如何扩展单个Prometheus实现近万Kubernetes集群监控?>中详细介绍了TKE团队大规模Kubernetes联邦监控系统Kvass的演进过程,其中介绍 ...

  6. vivo 容器集群监控系统架构与实践

    vivo 互联网服务器团队-YuanPeng 一.概述 从容器技术的推广以及 Kubernetes成为容器调度管理领域的事实标准开始,云原生的理念和技术架构体系逐渐在生产环境中得到了越来越广泛的应用实 ...

  7. 如何使用helm优雅安装prometheus-operator,并监控k8s集群微服务

    前言:随着云原生概念盛行,对于容器.服务.节点以及集群的监控变得越来越重要.Prometheus 作为 Kubernetes 监控的事实标准,有着强大的功能和良好的生态.但是它不支持分布式,不支持数据 ...

  8. Kubernetes集群部署史上最详细(二)Prometheus监控Kubernetes集群

    使用Prometheus监控Kubernetes集群 监控方面Grafana采用YUM安装通过服务形式运行,部署在Master上,而Prometheus则通过POD运行,Grafana通过使用Prom ...

  9. Prometheus监控elasticsearch集群(以elasticsearch-6.4.2版本为例)

    部署elasticsearch集群,配置文件可"浓缩"为以下: cluster.name: es_cluster node.name: node1 path.data: /app/ ...

随机推荐

  1. (六) Keras 模型保存和RNN简单应用

    视频学习来源 https://www.bilibili.com/video/av40787141?from=search&seid=17003307842787199553 笔记 RNN用于图 ...

  2. Android开发支付集成——支付宝集成

    微信支付传送门:https://www.cnblogs.com/dingxiansen/p/9209159.html 一.支付宝支付 1. 支付宝支付流程图 2. 集成前准备 去蚂蚁金服注册应用获取a ...

  3. ElasticSearch-6.3.2 linux 安装

    在linux 系统安装ElasticSearch-6.3.2最新版本,也适合6.x 系列版本做参考 前提先在linux 安装好jdk1.8 创建用户 从5.0开始,ElasticSearch 安全级别 ...

  4. deepin linux学习笔记

    目录 deepin linux学习笔记 前言 linux常用命令 ls 显示文件夹内容 cd 切换当前目录 pwd 查看当前工作目录 mkdir 新建文件夹 rm 删除文件或文件夹 mv 移动文件 c ...

  5. python使用rabbitMQ介绍四(路由模式)

    一.模式介绍 路由模式,与发布-订阅模式一样,消息发送到exchange中,消费者把队列绑定到exchange上. 这种模式在exchange上添加添加了一个路由键(routing-key),生产者发 ...

  6. 基于 libevent 开源框架实现的 web 服务器

    /* 原创文章 转载请附上原链接: https://www.cnblogs.com/jiujue/p/10707153.html   */ 自己实现的如有缺漏欢迎提出 直接代码 一切皆在代码中 首先是 ...

  7. iOS多线程GCD的使用

    1. GCD 简介 Grand Central Dispatch(GCD)是异步执行任务的技术之一.一般将应用程序中记述的线程管理用的代码在系统级中实现.开发者只需要定义想执行的任务并追加到适当的Di ...

  8. MongoDB语法与现有关系型数据库SQL语法比较

    MongoDB语法            MySql语法 db.test.find({'name':'foobar'})             <==>          select ...

  9. 【html】使用img标签和背景图片之间的区别

    1.加载问题 背景图片会等到html结构加载完成才开始加载 img标签是网页结构的一部分,会在html结构加载的时候加载 在网页加载的过程中,背景图片会等到结构加载完成(网页的内容全部显示以后)才开始 ...

  10. JProfiler的详细使用介绍

    转:https://blog.csdn.net/huangjin0507/article/details/52452946 下载软件 官网地址:http://www.ej-technologies.c ...