prometheus 集群
思路一
统一区域的监控目标,prometheus server两台监控相同的目标群体。

改变后

上面这个变化对于监控目标端,会多出一倍的查询请求,但在一台prometheus server宕机的情况下,可以不影响监控。
思路二
这是一个金字塔式的层次结构,而不是分布式层次结构。Prometheus 的抓取请求也会加载到prometheus work节点上,这是需要考虑的。

上面这种模式,准备3台prometheus server进行搭建,这种方式work节点一台宕机后,其它wokr节点不会去接手故障work节点的机器。
1、环境准备
192.168.31.151(primary)
192.168.31.144 (worker)
192.168.31.82(worker)
2、部署prometheus
cd /usr/loacl
tar -xvf prometheus-2.8.0.linux-amd64.tar.gz
ln -s /usr/local/prometheus-2.8.0.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus;mkdir bin conf data
mv ./promtool bin
mv ./prometheus bin
mv ./prometheus.yml conf
3、worker节点配置(192.168.31.144)
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
worker: 0 # Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets:
- 192.168.31.151:9090
- 192.168.31.144:9090
- 192.168.31.82:9090
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep
- job_name: 'node_exporter'
file_sd_configs:
- files:
- targets/nodes/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep
- job_name: 'docker'
file_sd_configs:
- files:
- targets/docker/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep
- job_name: 'alertmanager'
static_configs:
- targets:
- 192.168.31.151:9093
- 192.168.31.144:9093
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^0$
action: keep

worker节点配置(192.168.31.82)
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
external_labels:
worker: 1 # Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets:
- 192.168.31.151:9090
- 192.168.31.144:9090
- 192.168.31.82:9090
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep
- job_name: 'node_exporter'
file_sd_configs:
- files:
- targets/nodes/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep
- job_name: 'docker'
file_sd_configs:
- files:
- targets/docker/*.json
refresh_interval: 1m
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep
- job_name: 'alertmanager'
static_configs:
- targets:
- 192.168.31.151:9093
- 192.168.31.144:9093
relabel_configs:
- source_labels: [__address__]
modulus: 2
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: ^1$
action: keep

primary节点配置(192.168.31.151)
prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s). # Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.31.151:9093
- 192.168.31.144:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*_alerts.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'node_workers'
file_sd_configs:
- files:
- 'targets/workers/*.json'
refresh_interval: 5m
honor_labels: true
metrics_path: /federate
params:
'match[]':
- '{__name__=~"^instance:.*"}'
cat ./targets/workers/workers.json
[{
"targets": [
"192.168.31.144:9090",
"192.168.31.82:9090"
]
}]

prometheus 集群的更多相关文章
- Prometheus集群介绍-1
Prometheus监控介绍 公司做教育的,要迁移上云,所以需要我这边从零开始调研加后期维护Prometheus:近期看过二本方面的prometheus书籍,一本是深入浅出一般是实战方向的:官方文档主 ...
- Thanos prometheus 集群以及多租户解决方案docker-compose 试用(一)
prometheus 是一个非常不多的metrics 监控解决方案,但是对于ha 以及多租户的处理并不是很好,当前有好多解决方案 cortex Thanos prometheus+ influxdb ...
- 部署prometheus监控kubernetes集群并存储到ceph
简介 Prometheus 最初是 SoundCloud 构建的开源系统监控和报警工具,是一个独立的开源项目,于2016年加入了 CNCF 基金会,作为继 Kubernetes 之后的第二个托管项目. ...
- 如何扩展单个Prometheus实现近万Kubernetes集群监控?
引言 TKE团队负责公有云,私有云场景下近万个集群,数百万核节点的运维管理工作.为了监控规模如此庞大的集群联邦,TKE团队在原生Prometheus的基础上进行了大量探索与改进,研发出一套可扩展,高可 ...
- 如何用Prometheus监控十万container的Kubernetes集群
概述 不久前,我们在文章<如何扩展单个Prometheus实现近万Kubernetes集群监控?>中详细介绍了TKE团队大规模Kubernetes联邦监控系统Kvass的演进过程,其中介绍 ...
- vivo 容器集群监控系统架构与实践
vivo 互联网服务器团队-YuanPeng 一.概述 从容器技术的推广以及 Kubernetes成为容器调度管理领域的事实标准开始,云原生的理念和技术架构体系逐渐在生产环境中得到了越来越广泛的应用实 ...
- 如何使用helm优雅安装prometheus-operator,并监控k8s集群微服务
前言:随着云原生概念盛行,对于容器.服务.节点以及集群的监控变得越来越重要.Prometheus 作为 Kubernetes 监控的事实标准,有着强大的功能和良好的生态.但是它不支持分布式,不支持数据 ...
- Kubernetes集群部署史上最详细(二)Prometheus监控Kubernetes集群
使用Prometheus监控Kubernetes集群 监控方面Grafana采用YUM安装通过服务形式运行,部署在Master上,而Prometheus则通过POD运行,Grafana通过使用Prom ...
- Prometheus监控elasticsearch集群(以elasticsearch-6.4.2版本为例)
部署elasticsearch集群,配置文件可"浓缩"为以下: cluster.name: es_cluster node.name: node1 path.data: /app/ ...
随机推荐
- C# 通用单例窗体类
/// <summary> /// 通用的单例制作器 /// </summary> /// <typeparam name="T"></t ...
- 简单易懂的单元测试框架-gtest(二)
简介 事件机制用于在案例运行前后添加一些操作(相当于挂钩函数).目前,gtest提供了三种等级的事件,分别: 全局级,所有案例执行的前后 TestSuite级,某一个案例集的前后 TestCa ...
- DataGuard 单实例到RAC搭建
背景简介: 本文为针对一次windows平台RAC数据库迁移至Linux平台RAC的笔记,基本步骤为: 1.搭建windows RAC到Linux 单实例数据库的DataGuard 2.做switch ...
- emacs 文本替换
文本替换方法: M-% (query-replace) 输入 响应 SPC 或者 y 替换当前匹配并前进到下一个匹配处 DEL 或者 n 忽略此次匹配并前进到下一个匹配处 . 替换当前匹配并退出 , ...
- kali权限提升之配置不当提权与WCE
kali权限提升之配置不当提权与WCE 1.利用配置不当提权 2.WCE 3.其他提权 一.利用配置不当提权 与漏洞提权相比更常用的方法 在大部分企业环境下,会有相应的补丁更新策略,因此难以通过相应漏 ...
- 3星|《陈志武金融投资课》:金融改善社会,A股投资策略
从历史上的金融说起,介绍金融的基本知识.理念.大事.重要人物.也有一些A股投资策略和A股政策点评. 引用了不少学术研究成果做证据.讲历史的部分,功力比专业历史学者稍逊,毕竟这不是作者的专业. 我读后认 ...
- 查询本地电脑IP地址
使用Windows+R键打开"运行"窗口,然后输入CMD进入命令提示窗口 进入命令窗口之后,输入:ipconfig/all 回车即可看到整个电脑的详细的IP配置信息
- oracle 关于对时间操作的汇总
-- 对时间的操作 对当前日期增加一个小时: SQL> select sysdate, sysdate+numtodsinterval(1,’hour’) from dual ; 对当前日期增加 ...
- 03SpringBoot用JdbcTemplates访问Mysql
03SpringBoot用JdbcTemplates访问Mysql 文章指导 学习笔记 学习代码 初始化mysql -- create table `account` DROP TABLE `acco ...
- 用redis实现分布式锁,秒杀案例(转)
分布式锁的简单实现代码: 需要的jar包: jedis-2.9.0.jar. commons-pool2-2.4.2.jar import java.util.List; import java.ut ...