Grafana 系列文章（十四）：Helm 安装Loki

前言

写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装

现在开始介绍如何使用 Helm 安装 Loki.

前提

有 Helm, 并且添加 Grafana 的官方源:

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update

Warning:

网络受限, 需要保证网络通畅.

部署

架构

Promtail(收集) + Loki(存储及处理) + Grafana(展示)

Promtail

启用 Prometheus Operator Service Monitor 做监控
增加external_labels - cluster, 以识别是哪个 K8S 集群;
pipeline_stages 改为 cri, 以对 cri 日志做处理(因为我的集群用的 Container Runtime 是 CRI, 而 Loki Helm 默认配置是 docker)
增加对 systemd-journal 的日志收集:

promtail:

  config:

    snippets:

      pipelineStages:

        - cri: {}

  extraArgs:

    - -client.external-labels=cluster=ctyun

  # systemd-journal 额外配置:

  # Add additional scrape config

  extraScrapeConfigs:

    - job_name: journal

      journal:

        path: /var/log/journal

        max_age: 12h

        labels:

          job: systemd-journal

      relabel_configs:

        - source_labels: ['__journal__systemd_unit']

          target_label: 'unit'

        - source_labels: ['__journal__hostname']

          target_label: 'hostname'

  # Mount journal directory into Promtail pods

  extraVolumes:

    - name: journal

      hostPath:

        path: /var/log/journal

  extraVolumeMounts:

    - name: journal

      mountPath: /var/log/journal

      readOnly: true

Loki

启用持久化存储
启用 Prometheus Operator Service Monitor 做监控
1. 并配置 Loki 相关 Prometheus Rule 做告警
因为个人集群日志量较小, 适当调大 ingester 相关配置

Grafana

启用持久化存储
启用 Prometheus Operator Service Monitor 做监控
sidecar 都配置上, 方便动态更新 dashboards/datasources/plugins/notifiers;

Helm 安装

通过如下命令安装:

helm upgrade --install loki --namespace=loki --create-namespace grafana/loki-stack -f values.yaml

自定义 values.yaml 如下:

loki:

  enabled: true

  persistence:

    enabled: true

    storageClassName: local-path

    size: 20Gi

  serviceScheme: https

  user: admin

  password: changit!

  config:

    ingester:

      chunk_idle_period: 1h

      max_chunk_age: 4h

    compactor:

      retention_enabled: true

  serviceMonitor:

    enabled: true

    prometheusRule:

      enabled: true

      rules:

        #  Some examples from https://awesome-prometheus-alerts.grep.to/rules.html#loki

        - alert: LokiProcessTooManyRestarts

          expr: changes(process_start_time_seconds{job=~"loki"}[15m]) > 2

          for: 0m

          labels:

            severity: warning

          annotations:

            summary: Loki process too many restarts (instance {{ $labels.instance }})

            description: "A loki process had too many restarts (target {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

        - alert: LokiRequestErrors

          expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10

          for: 15m

          labels:

            severity: critical

          annotations:

            summary: Loki request errors (instance {{ $labels.instance }})

            description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

        - alert: LokiRequestPanic

          expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0

          for: 5m

          labels:

            severity: critical

          annotations:

            summary: Loki request panic (instance {{ $labels.instance }})

            description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

        - alert: LokiRequestLatency

          expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le)))  > 1

          for: 5m

          labels:

            severity: critical

          annotations:

            summary: Loki request latency (instance {{ $labels.instance }})

            description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

promtail:

  enabled: true

  config:

    snippets:

      pipelineStages:

        - cri: {}

  extraArgs:

    - -client.external-labels=cluster=ctyun

  serviceMonitor:

    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created

    enabled: true

  # systemd-journal 额外配置:

  # Add additional scrape config

  extraScrapeConfigs:

    - job_name: journal

      journal:

        path: /var/log/journal

        max_age: 12h

        labels:

          job: systemd-journal

      relabel_configs:

        - source_labels: ['__journal__systemd_unit']

          target_label: 'unit'

        - source_labels: ['__journal__hostname']

          target_label: 'hostname'

  # Mount journal directory into Promtail pods

  extraVolumes:

    - name: journal

      hostPath:

        path: /var/log/journal

  extraVolumeMounts:

    - name: journal

      mountPath: /var/log/journal

      readOnly: true

fluent-bit:

  enabled: false

grafana:

  enabled: true

  adminUser: caseycui

  adminPassword: changit!

  ## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders

  ## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards

  sidecar:

    image:

      repository: quay.io/kiwigrid/k8s-sidecar

      tag: 1.15.6

      sha: ''

    dashboards:

      enabled: true

      SCProvider: true

      label: grafana_dashboard

    datasources:

      enabled: true

      # label that the configmaps with datasources are marked with

      label: grafana_datasource

    plugins:

      enabled: true

      # label that the configmaps with plugins are marked with

      label: grafana_plugin

    notifiers:

      enabled: true

      # label that the configmaps with notifiers are marked with

      label: grafana_notifier

  image:

    tag: 8.3.5

  persistence:

    enabled: true

    size: 2Gi

    storageClassName: local-path

  serviceMonitor:

    enabled: true

  imageRenderer:

    enabled: disable

filebeat:

  enabled: false

logstash:

  enabled: false

安装后的资源拓扑如下:

Day 2 配置(按需)

Grafana 增加 Dashboards

在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入)

apiVersion: v1

kind: ConfigMap

metadata:

  name: sample-grafana-dashboard

  labels:

     grafana_dashboard: "1"

data:

  k8s-dashboard.json: |-

  [...]

Grafana 增加 DataSource

在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入)

apiVersion: v1

kind: ConfigMap

metadata:

  name: loki-loki-stack

  labels:

    grafana_datasource: '1'

data:

  loki-stack-datasource.yaml: |-

    apiVersion: 1

    datasources:

    - name: Loki

      type: loki

      access: proxy

      url: http://loki:3100

      version: 1

Traefik 配置 Grafana IngressRoute

因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:

apiVersion: traefik.containo.us/v1alpha1

kind: IngressRoute

metadata:

  name: grafana

spec:

  entryPoints:

    - web

    - websecure

  routes:

    - kind: Rule

      match: Host(`grafana.ewhisper.cn`)

      middlewares:

        - name: hsts-header

          namespace: kube-system

        - name: redirectshttps

          namespace: kube-system

      services:

        - name: loki-grafana

          namespace: monitoring

          port: 80

  tls: {}

最终效果

如下:

️参考文档

helm-charts/charts at main · grafana/helm-charts (github.com)

Grafana 系列文章

三人行, 必有我师; 知识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

Grafana 系列文章（十四）：Helm 安装Loki的更多相关文章

Grafana 系列文章（十一）：Loki 中的标签如何使日志查询更快更方便
️URL: https://grafana.com/blog/2020/04/21/how-labels-in-loki-can-make-log-queries-faster-and-easier/ ...
Grafana 系列文章（四）：Grafana Explore
️URL: https://grafana.com/docs/grafana/latest/explore/ Description: Explore Grafana 的仪表盘 UI 是关于构建可视化 ...
NHibernate系列文章十四：NHibernate事务
摘要 NHibernate实现事务机制非常简单,调用ISession.BeginTransaction()开启一个事务对象ITransaction,使用ITransaction.Commit()提交事 ...
Grafana 系列文章（十）：为什么应该使用 Loki
️URL: https://grafana.com/blog/2020/09/09/all-the-non-technical-advantages-of-loki-reduce-costs-stre ...
Grafana 系列文章（十二）：如何使用Loki创建一个用于搜索日志的Grafana仪表板
概述创建一个简单的 Grafana 仪表板, 以实现对日志的快速搜索. 有经验的直接用 Grafana 的 Explore 功能就可以了. 但是对于没有经验的人, 他们如何能有一个已经预设了简单的标 ...
Grafana 系列文章（十三）：如何用 Loki 收集查看 Kubernetes Events
前情提要 IoT 边缘集群基于 Kubernetes Events 的告警通知实现 IoT 边缘集群基于 Kubernetes Events 的告警通知实现(二):进一步配置概述在分析 K8S 集 ...
Grafana 系列文章（九）：开源云原生日志解决方案 Loki 简介
简介 Grafana Labs 简介 Grafana 是用于时序数据的事实上的仪表盘解决方案.它支持近百个数据源. Grafana Labs 想从一个仪表盘解决方案转变成一个可观察性 (observa ...
Grafana 系列文章（一）：基于 Grafana 的全栈可观察性 Demo
️Reference: https://github.com/grafana/intro-to-mlt 这是关于 Grafana 中可观察性的三个支柱的一系列演讲的配套资源库. 它以一个自我封闭的 D ...
Chrome浏览器扩展开发系列之十四
Chrome浏览器扩展开发系列之十四:本地消息机制Native messaging 时间:2015-10-08 16:17:59 阅读:1361 评论:0 收藏:0 ...
OSGi 系列（十四）之 Event Admin Service
OSGi 系列(十四)之 Event Admin Service OSGi 的 Event Admin 服务规范提供了开发者基于发布/订阅模型,通过事件机制实现 Bundle 间协作的标准通讯方式. ...

随机推荐

从0到1搭建redis6.0.7续更~
"心有所向,日复一日,必有精进" 前言: 想必大家看完我之前写的搭建redis服务器,大家都已经把redis搭建起来了吧如果没有搭建起来的小可爱请移步这里哦从0到1搭建redis6 ...
Graceful Java之try...catch()
[优美的Java代码之try...catch] 目录概述优化优化前写法(JDK1.7之前) 优化后写法(JDK1.7及以后) 延伸阅读:嵌套的文件流如何正确的关闭概述通常我们使用try... ...
Docker | 专栏文章整理🎉🎉
Docker Docker系列文章基本已经更新完毕,这是我从去年的学习笔记中整理出来的. 笔记稍微有点杂乱.随意,把它们整理成文章花费了不少力气.整理的过程也是我的一个再次学习的过程,同时也是为了方便 ...
数组去重函数(unique)
题目链接 stl中的一员大将:unique 也就是去重,通俗来讲,这个玩应的用法一般是 unique(数组名,数组名+大小)(没错和sort几乎一模一样) 然后值得注意的有两点:第一点:在unique ...
【云原生 · Kubernetes】Kubernetes运维
(1)Node的隔离与恢复在硬件升级.硬件维护等情况下,需要将某些Node隔离.使用kubectl cordon <node_name>命令可禁止Pod调度到该节点上,在其上运行的Pod ...
我要涨知识——TypeScript 常见面试题（二）
又是一个年底来了,好大一批人可能又准备跑路了,最近回家待产,翻了翻掘金和 CSDN 发现好多大佬都有大厂 Offer ,看着看着我心动了! 话不多说,赶紧开干,给自己整了一个前端面试小助手--微信小程 ...
redis集群之分片集群的原理和常用代理环境部署
上篇文章刚刚介绍完redis的主从复制集群,但主从复制集群主要是为了解决redis集群的单点故障问题,通过整合哨兵能实现集群的高可用:但是却无法解决数据容量以及单节点的压力问题,所以本文继续介绍red ...
Isaac SDK & Sim 环境
Isaac 是 NVIDIA 开放的机器人平台.其 Isaac SDK 包括以下内容: Isaac Apps: 各种机器人应用示例,突出 Engine 特性或专注 GEM 功能 Isaac Engin ...
3、mysql着重号解决关键字冲突
1.着重号(` `): 使用着重号(` `)将字段名或表名括起来解决冲突:保证表中的字段.表名等没有和保留字.数据库系统名或常用方法名冲突
DevSecOps 需要知道的十大 K8s 安全风险及建议
Kubernetes (K8s)是现代云原生世界中的容器管理平台.它实现了灵活.可扩展地开发.部署和管理微服务.K8s 能够与各种云提供商.容器运行时接口.身份验证提供商和可扩展集成点一起工作.然而 ...