排查 Kubernetes HPA 通过 Prometheus 获取不到 http

部署好了 kube-prometheus 与 k8s-prometheus-adapter （详见之前的博文 k8s 安装 prometheus 过程记录），使用下面的配置文件部署 HPA(Horizontal Pod Autoscaling) 却失败。

apiVersion: autoscaling/v2beta2

kind: HorizontalPodAutoscaler

metadata:

  name: blog-web

spec:

  scaleTargetRef:

    apiVersion: apps/v1

    kind: Deployment

    name: blog-web

  minReplicas: 2

  maxReplicas: 12

  metrics:

    - type: Pods

      pods:

        metric:

          name: http_requests

        target:

          type: AverageValue

          averageValue: 100

错误信息如下：

unable to get metric http_requests: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests for pods

通过下面的命令查看 custom.metrics.k8s.io api 支持的 http_requests（每秒请求数QPS)监控指标：

$kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq . | egrep pods/.*http_requests

      "name": "pods/alertmanager_http_requests_in_flight",

      "name": "pods/prometheus_http_requests"

发现只有 prometheus_http_requests 指标，没有所需的 http_requests 开头的指标。

打开 prometheus 控制台，发现 /service-discovery 中没有出现我们想监控的应用 blog-web ，网上查找资料后知道了需要部署 ServiceMonitor 让 prometheus 发现所监控的 service 。

添加下面的 ServiceMonitor 配置文件：

kind: ServiceMonitor

apiVersion: monitoring.coreos.com/v1

metadata:

  name: blog-web-monitor

  labels:

    app: blog-web-monitor

spec:

  selector:

    matchLabels:

      app: blog-web

  endpoints:

  - port: http

部署后还是没有被 prometheus 发现，查看 prometheus 的日志发现下面的错误：

Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope

在园子里的博文 PrometheusOperator服务自动发现-监控redis样例中找到了解决方法，将 prometheus-clusterRole.yaml 改为下面的配置：

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRole

metadata:

  name: prometheus-k8s

rules:

- apiGroups:

  - ""

  resources:

  - nodes

  - services

  - endpoints

  - pods

  - nodes/proxy

  verbs:

  - get

  - list

  - watch

- apiGroups:

  - ""

  resources:

  - configmaps

  - nodes/metrics

  verbs:

  - get

- nonResourceURLs:

  - /metrics

  verbs:

  - get

重新部署即可

kubectl apply -f prometheus-clusterRole.yaml

注1：如果采用上面的方法还是没被发现，需要强制刷新 prometheus 的配置，参考部署 ServiceMonitor 之后如何让 Prometheus 立即发现。

注2：也可以将 prometheus 配置为自动发现 service 与 pod ，参考园子里的博文 prometheus配置pod和svc的自动发现和监控与 PrometheusOperator服务自动发现-监控redis样例。

但是这时还有问题，虽然 service 被 prometheus 发现了，但 service 所对应的 pod 一个都没被发现。

production/blog-web-monitor/0 (0/19 active targets)

排查后发现是因为 ServiceMonitor 与 Service 配置不对应，Service 配置文件中缺少 ServiceMonitor 配置中 matchLabels 所对应的 label ，ServiceMonitor 中的 port 没有对应 Service 中的 ports 配置，修正后的配置如下：

service-blog-web.yaml

apiVersion: v1

kind: Service

metadata:

  name: blog-web

  labels:

    app: blog-web

spec:

  type: NodePort

  selector:

    app: blog-web

  ports:

  - name: http-blog-web

    nodePort: 30080

    port: 80

    targetPort: 80

servicemonitor-blog-web.yaml

kind: ServiceMonitor

apiVersion: monitoring.coreos.com/v1

metadata:

  name: blog-web-monitor

  labels:

    app: blog-web

spec:

  selector:

    matchLabels:

      app: blog-web

  endpoints:

  - port: http-blog-web

用修正后的配置部署后，pod 终于被发现了：

production/blog-web-monitor/0 (0/5 up)

但是这些 pod 全部处于 down 状态。

Endpoint	                      State	 Scrape Duration	Error

http://192.168.107.233:80/metrics DOWN	 server returned HTTP status 400 Bad Request

通过园子里的博文使用Kubernetes演示金丝雀发布知道了原来需要应用自己提供 metrics 监控指标数据让 prometheus 抓取。

标准Tomcat自带的应用没有/metrics这个路径，prometheus获取不到它能识别的格式数据，而指标数据就是从/metrics这里获取的。所以我们使用标准Tomcat不行或者你就算有这个/metrics这个路径，但是返回的格式不符合prometheus的规范也是不行的。

我们的应用是用 ASP.NET Core 开发的，所以选用了 prometheus-net ，由它提供 metrics 数据给 prometheus 抓取。

安装 nuget 包

dotnet add package prometheus-net.AspNetCore

添加 HttpMetrics 中间件

app.UseRouting();

app.UseHttpMetrics();

添加 MapMetric 路由

app.UseEndpoints(endpoints =>

{

   endpoints.MapMetrics();

};

当通过下面的命令确认通过 /metrics 路径可以获取监控数据时，

$ docker exec -t $(docker ps -f name=blog-web_blog-web -q | head -1) curl 127.0.0.1/metrics | grep http_request_duration_seconds_sum

http_request_duration_seconds_sum{code="200",method="GET",controller="AggSite",action="SiteHome"} 0.44973779999999997

http_request_duration_seconds_sum{code="200",method="GET",controller="",action=""} 0.0631272

Prometheus 控制台 /targets 页面就能看到 blog-web 对应的 pod 都处于 up 状态。

production/blog-web-monitor/0 (5/5 up)

这时通过 custom metrics api 可以查询到一些 http_requests 相关的指标。

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq . | egrep pods/*/http_requests

      "name": "pods/http_requests_in_progress",

      "name": "pods/http_requests_received"

这里的 http_requests_received 就是 QPS（每秒请求数）指标数据，用下面的命令请求 custom metrics api 获取数据：

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_received | jq .

其中1个 pod 的 http_requests_received 指标数据如下：

{

  "kind": "MetricValueList",

  "apiVersion": "custom.metrics.k8s.io/v1beta1",

  "metadata": {

    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/%2A/http_requests_received"

  },

  "items": [

    {

      "describedObject": {

        "kind": "Pod",

        "namespace": "production",

        "name": "blog-web-65f7bdc996-8qp5c",

        "apiVersion": "/v1"

      },

      "metricName": "http_requests_received",

      "timestamp": "2020-01-18T14:35:34Z",

      "value": "133m",

      "selector": null

    }

  ]

}

其中的 133m 表示 0.133 。

然后就可以在 HPA 配置文件中基于这个指标进行自动伸缩

apiVersion: autoscaling/v2beta2

kind: HorizontalPodAutoscaler

metadata:

  name: blog-web

spec:

  scaleTargetRef:

    apiVersion: apps/v1

    kind: Deployment

    name: blog-web

  minReplicas: 5

  maxReplicas: 12

  metrics:

  - type: Pods

    pods:

      metric:

        name: http_requests_received

      target:

        type: AverageValue

        averageValue: 100

终于搞定了！

# kubectl get hpa

NAME       REFERENCE             TARGETS    MINPODS   MAXPODS   REPLICAS   AGE

blog-web   Deployment/blog-web   133m/100   5         12        5          4d

排查 Kubernetes HPA 通过 Prometheus 获取不到 http_requests 指标的问题的更多相关文章

终于成功部署 Kubernetes HPA 基于 QPS 进行自动伸缩
昨天晚上通过压测验证了 HPA 部署成功了. 所使用的 HPA 配置文件如下: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscale ...
kubernetes之监控Prometheus实战--prometheus介绍--获取监控（一）
Prometheus介绍 Prometheus是一个最初在SoundCloud上构建的开源监控系统 .它现在是一个独立的开源项目,为了强调这一点,并说明项目的治理结构,Prometheus 于2016 ...
Kubernetes 监控：Prometheus Adpater =》自定义指标扩缩容
使用 Kubernetes 进行容器编排的主要优点之一是,它可以非常轻松地对我们的应用程序进行水平扩展.Pod 水平自动缩放(HPA)可以根据 CPU 和内存使用量来扩展应用,前面讲解的 HPA 章节 ...
Kubernetes 监控：Prometheus Operator + Thanos ---实践篇
具体参考网址:https://www.cnblogs.com/sanduzxcvbnm/p/16291296.html 本章用到的yaml文件地址:https://files.cnblogs.com/ ...
Kubernetes HPA 使用详解
文章转载自:https://www.qikqiak.com/post/k8s-hpa-usage/ Kubernetes 提供了这样的一个资源对象:Horizontal Pod Autoscaling ...
Kubernetes 监控：Prometheus Operator
安装前面的章节中我们学习了用自定义的方式来对 Kubernetes 集群进行监控,基本上也能够完成监控报警的需求了.但实际上对上 Kubernetes 来说,还有更简单方式来监控报警,那就是 Pro ...
kubernetes学习笔记之十二：资源指标API及自定义指标API
第一章.前言以前是用heapster来收集资源指标才能看,现在heapster要废弃了从1.8以后引入了资源api指标监视资源指标:metrics-server(核心指标) 自定义指标:prome ...
Kubernetes之利用prometheus监控K8S集群
prometheus它是一个主动拉取的数据库,在K8S中应该展示图形的grafana数据实例化要保存下来,使用分布式文件系统加动态PV,但是在本测试环境中使用本地磁盘,安装采集数据的agent使用Da ...
在Kubernetes下部署Prometheus
使用ConfigMaps管理应用配置当使用Deployment管理和部署应用程序时,用户可以方便了对应用进行扩容或者缩容,从而产生多个Pod实例.为了能够统一管理这些Pod的配置信息,在Kuber ...

随机推荐

Spark 配置参数
SparkConfiguration 这一章节来看看 Spark的相关配置. 并非仅仅能够应用于 SparkStreaming, 而是对于 Spark的各种类型都有支持. 各个不同. 其中中文参考链接 ...
JS的var和let的区别（详细讲解）
let是ES6新增的,它主要是弥补var的缺陷,你也可以把let看做var的升级版.下面我就来详细讲讲var和let的区别相同点: var和let都有函数级作用域不同点: (1)var是全局作用域 ...
点分治（等级排） codeforces 321C
Now Fox Ciel becomes a commander of Tree Land. Tree Land, like its name said, has n cities connected ...
《【面试突击】— Redis篇》--Redis Cluster及缓存使用和架构设计的常见问题
能坚持别人不能坚持的,才能拥有别人未曾拥有的.关注编程大道公众号,让我们一同坚持心中所想,一起成长!! <[面试突击]— Redis篇>--Redis Cluster及缓存使用和架构设计的 ...
简单介绍HTTP的请求（get请求和post请求）以及对应的响应的内容
链接解析: https://oa.hbgf.net.cn/login.jsp;jsessionid=47084322738F8DB18D60752944DFD1AA http或者https表示使用的是 ...
python 判断文件的字符编码
import chardet f = open(file='test1.txt', mode='rb') data = f.read() print(chardet.detect(data))
python 继承机制（子类化内置类型）
1. 如果想实现与某个内置类型具有类似行为的类时,最好的方法就是将这个内置类型子类化. 2. 内置类型子类化,其实就是自定义一个新类,使其继承有类似行为的内置类,通过重定义这个新类实现指定的功能. c ...
python条件与循环-循环
1 while语句 while用于实现循环语句,通过判断条件是否为真,来决定是否继续执行. 1.1 一般语法语法如下: while expression: suite_to_repeat 1.2 计 ...
从零开始ming的多人联机游戏（3）为socket通讯添加mysql数据库
macOS下visual studio C#加载mySql 本文在上一节的基础上,添加了mysql数据库的功能.client发送信息给服务器后,服务器将收到的消息保存在数据库中. 如果client发送 ...
HTTP的传输编码(Transfer-Encoding:chunked)
转载自HTTP传输编码增加了传输量,只为解决这一个问题 | 实用 HTTP,本来是将这篇文章收藏在我的有道云笔记的,但是今天复习的时候看着这个标题这尴尬,这里转载一下-... 什么是传输编码? 传输编 ...

排查 Kubernetes HPA 通过 Prometheus 获取不到 http_requests 指标的问题

排查 Kubernetes HPA 通过 Prometheus 获取不到 http_requests 指标的问题的更多相关文章

随机推荐

热门专题