prometheus operator 部署自定义记录

环境:

k8s 1.11集群版本,kubeadm部署

docker 17.3.2版本

Centos 7系统

阿里云服务器

operator 源码下载

仓库下载prometheus operator

$ git clone https://github.com/coreos/kube-prometheus.git
$ cd kube-prometheus/manifests

进入到 manifests 目录下面,这个目录下面包含我们所有的资源清单文件,我们需要对其中的文件 prometheus-serviceMonitorKubelet.yaml 进行简单的修改,因为默认情况下,这个 ServiceMonitor 是关联的 kubelet 的10250端口去采集的节点数据,而我们前面说过为了安全,这个 metrics 数据已经迁移到10255这个只读端口上面去了,我们只需要将文件中的https-metrics更改成http-metrics即可,这个在 Prometheus-Operator 对节点端点同步的代码中有相关定义,感兴趣的可以点此查看完整代码

Subsets: []v1.EndpointSubset{
{
Ports: []v1.EndpointPort{
{
Name: "https-metrics",
Port: 10250,
},
{
Name: "http-metrics",
Port: 10255,
},
{
Name: "cadvisor",
Port: 4194,
},
},
},
},

需要注意将insecureSkipVerify参数配置为false,http才生效 insecureSkipVerify: false

endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 30s
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: false
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 30s
metricRelabelings:
- action: drop
regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
sourceLabels:
- __name__
path: /metrics/cadvisor
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: false

配置alertmanager告警路由

配置钉钉路由文件,并创建为secret对象,挂载到prometheus-prometheus,yaml文件中。这里需要将prometheus数据就行持久化存储,还需要定义一个storageClass或者pvc挂载进去。

alertmanager-main.yaml

global:
resolve_timeout: 5m route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://prometheus-webhook-dingtalk.monitors.svc.cluster.local:8060/dingtalk/ops_dingding/send'

创建alertmanager 配置文件secret对象

kubectl -n monitoring create se^C
[k8s@master ~]$ kubectl -n monitoring create secret generic altermanager-main --from-file=altermanager-main.yaml

创建storageClass对象,为prometheus提供持久化存储,这里使用阿里云提供的云盘或NAS服务,创建自定义storageClass对象

这里选择云盘创建的alicloud-disk-ssd 存储对象

prometheus配置kubernetes_sd_configs

为prometheus配置服务自动发现功能,将prometheus-additional.yaml文件创建为secret对象

prometheus-additional.yaml

- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name - job_name: 'kubernetes-services'
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module: [http_2xx]
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name - job_name: 'kubernetes-ingresses'
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name - job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

创建secret对象additional-config

kubectl -n monitoring create secret generics additional-config --from-file=prometheus-additional.yaml

修改prometheus-prometheus.yaml文件

将前面自定的配置文件和存储类写进prometheus中,实现prometheus监控的自定义化

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
storage: #配置持久化存储
volumeClaimTemplate:
spec:
storageClassName: alicloud-disk-ssd #使用alicloud-disk-ssd存储类
resources:
requests:
storage: 50Gi
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
secrets: #etcd 证书secret配置文件
- etcd-certs
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
additionalScrapeConfigs: #配置服务发现功能
name: additional-configs #secret 资源对象名称
key: prometheus-additional.yaml #secret 对象中的key
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0

etcd 使用的证书都对应在节点的 /etc/kubernetes/pki/etcd 这个路径下面,所以首先我们将需要使用到的证书通过 secret 对象保存到集群中去:(在 etcd 运行的节点)

kubectl create secret generics etcd---from-file=/etc/kubernetes/pki/etcd/ca.pemcerts --from-file=/etc/kubernetes/pki/etcd/etcd-client.pem --from-file=/etc/kubernetes/pki/etcd/etcd-client-key.pem -n monitoring

创建 ServiceMonitor

现在 Prometheus 访问 etcd 集群的证书已经准备好了,接下来创建 ServiceMonitor 对象即可(prometheus-serviceMonitorEtcd.yaml)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd-k8s
spec:
jobLabel: k8s-app
endpoints:
- port: port
interval: 30s
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
certFile: /etc/prometheus/secrets/etcd-certs/etcd-client.pem
keyFile: /etc/prometheus/secrets/etcd-certs/etcd-client-key.pem
insecureSkipVerify: true
selector:
matchLabels:
k8s-app: etcd
namespaceSelector:
matchNames:
- kube-system

创建service

ServiceMonitor 创建完成了,但是现在还没有关联的对应的 Service 对象,所以需要我们去手动创建一个 Service 对象(prometheus-etcdService.yaml):

apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: port
port: 2379
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 172.16.23.231
nodeName: etcd-master
ports:
- name: port
port: 2379
protocol: TCP

我们这里创建的 Service 没有采用前面通过 label 标签的形式去匹配 Pod 的做法,因为前面我们说过很多时候我们创建的 etcd 集群是独立于集群之外的,这种情况下面我们就需要自定义一个 Endpoints,要注意 metadata 区域的内容要和 Service 保持一致,Service 的 clusterIP 设置为 None,对改知识点不太熟悉的,可以去查看我们前面关于 Service 部分的讲解。

Endpoints 的 subsets 中填写 etcd 集群的地址即可,我们这里是单节点的,填写一个即可,如果etcd配置文件中配置地址为127.0.0.1则有可能监控失败,需要修改为0.0.0.0

创建scheduler,controller-manager 组件的Service对象

kube-scheduler、kube-controller-manager组件绑定地址都为127.0.0.1,需要进入配置文件进行修改为0.0.0.0才能访问端口,进行监控

apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
labels:
k8s-app: kube-scheduler
spec:
selector:
component: kube-scheduler
clusterIP: None
ports:
- name: http-metrics
targetPort: 10251
port: 10251
protocol: TCP ---
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
k8s-app: kube-controller-manager
spec:
selector:
component: kube-controller-manager
clusterIP: None
ports:
- name: http-metrics
targetPort: 10252
port: 10252 ##kubelet-service.yaml 文件省略

修改prometheus sa用户权限

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get

创建operator

kubectl create -f .

到此prometheus-operator 生产环境中就已经部署完毕,grafana的图表配置和alertmanager的告警优化模板通知功能还需补全。

1、创建storageClass对象,为prometheus提供持久化存储写入prometheus-prometheus.yaml文件中
2、修改alermanager-secret.yaml secret对象中的配置数据,改成自定义的钉钉报警路由或者
邮箱账号配置
3、创建etcd、scheduler、controller Service对象
4、配置服务告警规则prometheus-etcdRules.yaml 文件或在源文件中添加
5、创建prometheus服务自动发现secret配置文件,并写入prometheus-prometheus.yaml文件中
6、创建etcd证书secret 、serviceMonitorEtcd对象文件
7、修改promethus-clusterRule.yaml 权限
8、执行部署

prometheus operator 部署的更多相关文章

  1. Prometheus Operator 教程:根据服务维度对 Prometheus 分片

    原文链接:https://fuckcloudnative.io/posts/aggregate-metrics-user-prometheus-operator/ Promtheus 本身只支持单机部 ...

  2. 使用Prometheus Operator 监控Kubernetes(15)

    一.Prometheus概述: Prometheus是一个开源系统监测和警报工具箱. Prometheus Operator 是 CoreOS 开发的基于 Prometheus 的 Kubernete ...

  3. Prometheus Operator 对接 Thanos

    文章转载自:https://jishuin.proginn.com/p/763bfbd56ae4 使用 Prometheus Operator 来进行监控,在 Prometheus 高可用的章节中也手 ...

  4. Kubernetes 监控:Prometheus Operator + Thanos ---实践篇

    具体参考网址:https://www.cnblogs.com/sanduzxcvbnm/p/16291296.html 本章用到的yaml文件地址:https://files.cnblogs.com/ ...

  5. kubernetes之监控Operator部署Prometheus(三)

    第一章和第二章中我们配置Prometheus的成本非常高,而且也非常麻烦.但是我们要考虑Prometheus.AlertManager 这些组件服务本身的高可用的话,成本就更高了,当然我们也完全可以用 ...

  6. 部署 Prometheus Operator - 每天5分钟玩转 Docker 容器技术(179)

    本节在实践时使用的是 Prometheus Operator 版本 v0.14.0.由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档. 下载最新源码 git clone https: ...

  7. 部署 Prometheus Operator【转】

    本节在实践时使用的是 Prometheus Operator 版本 v0.14.0.由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档. 下载最新源码 git clone https: ...

  8. Prometheus Operator 架构 - 每天5分钟玩转 Docker 容器技术(178)

    本节讨论 Prometheus Operator 的架构.因为 Prometheus Operator 是基于 Prometheus 的,我们需要先了解一下 Prometheus. Prometheu ...

  9. Prometheus Operator 监控Kubernetes

    Prometheus Operator 监控Kubernetes 1. Prometheus的基本架构 ​ Prometheus是一个开源的完整监控解决方案,涵盖数据采集.查询.告警.展示整个监控流程 ...

随机推荐

  1. 【转】redis报错“max number of clients reached"

    查看redis监控的时候看到redis的graph出现不正常的情况,截图如下: 如上面截图所展示的样子,可以看到redis 的客户端连接数很突兀的上升到10K,又突然下降到0.排除了监控本身的原因,很 ...

  2. 为CentOS安装python3

    摘自:https://www.jianshu.com/p/7c2b62c37223 1. 安装依赖 不要复制往下看 yum install openssl-devel bzip2-devel expa ...

  3. [Design Patterns] 02. Structural Patterns - Facade Pattern

    前言 参考资源 史上最全设计模式导学目录(完整版) 只把常用的五星的掌握即可. 外观模式-Facade Pattern[学习难度:★☆☆☆☆,使用频率:★★★★★] 深入浅出外观模式(一):外观模式概 ...

  4. GCC版本中没有GLIBCXX_3.4.15错误

    解决错误呈现该错误的原因是当前的GCC版本中,没有GLIBCXX_3.4.15,须要安装更高版本.我们可以输入:strings /usr/lib64/libstdc++.so.6 | grep GLI ...

  5. [LeetCode] 117. Populating Next Right Pointers in Each Node II 每个节点的右向指针 II

    Follow up for problem "Populating Next Right Pointers in Each Node". What if the given tre ...

  6. sqlyog 社区版

    https://github.com/webyog/sqlyog-community/wiki/Downloads

  7. [转]综述论文翻译:A Review on Deep Learning Techniques Applied to Semantic Segmentation

    近期主要在学习语义分割相关方法,计划将arXiv上的这篇综述好好翻译下,目前已完成了一部分,但仅仅是尊重原文的直译,后续将继续完成剩余的部分,并对文中提及的多个方法给出自己的理解. _论文地址:htt ...

  8. [转帖]微软宣布加入 OpenJDK 项目

    微软宣布加入 OpenJDK 项目 https://news.cnblogs.com/n/646003/ 近日,微软的 Bruno Borges 在 OpenJDK 邮件列表中发布了一条消息,内容包含 ...

  9. Fiddler抓包工具如何可以抓取HTTPS

  10. 3.02定义常量之const

    [注:本程序验证是使用vs2013版] #include <stdio.h> #include <stdlib.h> #include <string.h> #pr ...