同步工具之Vector

用于构建可观察性管道的轻量级、超快速工具

[安装]

curl --proto '=https' --tlsv1.2 -sSf https://sh.vector.dev | bash

source ~/.profile

可测试配置文件:

vector --config /root/.vector/config/vector.toml

[使用]

[加入系统管理]

vim /etc/systemd/system/vector.service

[Unit]

Description=Vector

Documentation=https://vector.dev

After=network-online.target

Requires=network-online.target

[Service]

User=vector

Group=vector

ExecStart=/usr/bin/vector -c /etc/vector/datacenter/*.yaml

ExecReload=/bin/kill -HUP $MAINPID

Restart=no

EnvironmentFile=-/etc/default/vector

[Install]

WantedBy=multi-user.target

示例:

Toml数据格式

---

sources:

  kafka_app_events:

    type: "kafka"

    bootstrap_servers: "kafka1:9092,kafka2:9092,kafka3:9092"

    group_id: vector-sink-beta

    topics:

      - login_test

      - button_click_test

    auto_offset_reset: earliest

transforms:

  remap_public_fields:

    type: remap

    drop_on_error: true

    inputs:

      - kafka_app_events

    source: |-

      msg = parse_json!(.message)

      msg.kafka_offset = .offset

      msg.kafka_partition = .partition

      msg.kafka_topic = .topic

      msg.app_id = to_int!(msg.app_id)

      msg.number_id = to_int!(msg.number_id)

      msg.player_id = to_string!(msg.player_id)

      msg.player_type = to_int!(msg.player_type)

      msg.platform = to_int!(msg.platform)

      msg.params = to_string!(msg.params)

      msg.client_version = to_string!(msg.client_version)

      msg.reg_channel = to_int!(msg.reg_channel)

      msg.channel = to_int(msg.channel)??0

      msg.main_channel = msg.channel

      if msg.channel > 10000000 {

        msg.main_channel = to_int(msg.channel / 10000 ?? 0)

      }

      . = msg    

  route_events:

    type: "route"

    inputs:

      - remap_public_fields

    route:

      login: .kafka_topic == "login_test"

      button_click: .kafka_topic == "button_click_test"

  remap_button_click_test:

    type: remap

    drop_on_error: true

    inputs:

    - route_events.button_click

    source: |-

      .button_id = to_int!(.button_id)

  remap_login_test:

    type: remap

    drop_on_error: true

    inputs:

    - route_events.login

    source: |-

      .is_new = to_int!(.is_new)

      .longitude = to_float!(.longitude)

      .latitude = to_float!(.latitude)

sinks:

  clickhouse_button_click_test:

    type: clickhouse

    auth:

      user: vector_beta

      password: xxx

      strategy: basic

    inputs:

    - remap_button_click_test

    compression: gzip

    database: events_beta

    endpoint: http://xxx.com:8123

    table: button_click_all

    encoding:

      only_fields:

      - kafka_partition

      - kafka_offset    

      - data_time

      - app_id

      - tags

      - player_id

      - number_id

      - player_type

      - params

      - platform

      - reg_channel

      - channel

      - main_channel

      - client_version

      - button_id

    healthcheck:

      enabled: true

  clickhouse_login_test:

    type: clickhouse

    auth:

      user: vector_beta

      password: xxx

      strategy: basic

    inputs:

    - remap_login_test

    compression: gzip

    database: events_beta

    endpoint: http://xxx.com:8123

    table: login_all

    encoding:

      only_fields:

      - kafka_partition

      - kafka_offset   

      - data_time

      - app_id

      - tags

      - player_id

      - number_id

      - player_type

      - params

      - platform

      - reg_channel

      - channel

      - main_channel

      - client_version

      - is_new

      - ip

      - device_id

      - device_os

      - device_brand

      - device_model

      - ppi

      - longitude

      - latitude

    healthcheck:

      enabled: true

实战:

使用vector+clickhouse来收集nginx日志, 最后使用gradfana进行展示

1)定义nginx的访问日志格式

log_format track '$remote_addr - $time_iso8601 "$request_uri" '

                 '$status $body_bytes_sent "$http_user_agent"';

2)例如数据的日志path为 /var/log/track.log

定义解析日志

[sources.home]

type = "file"

include = ["/var/log/track.log"]

read_from = "end"

[transforms.process]

type = "remap"

inputs = ["home"]

source = '''

. |= parse_regex!(.message, r'^(?P<ip>\d+\.\d+\.\d+\.\d+) \- (?P<date>\d+\-\d+\-\d+)T(?P<time>\d+:\d+:\d+).+?"(?P<url>.+?)" (?P<status>\d+) (?P<size>\d+) "(?P<agent>.+?)"$')

.status = to_int!(.status)

.size = to_int!(.size)

.time = .date + " " + .time

'''

[sinks.print]

type = "console"

inputs = ["process"]

encoding.codec = "json"

[sinks.clickhouse]

type = "clickhouse"

inputs = ["process"]

endpoint = "http://xx.xx.xx.xx:8123"

database = "nginx_db"

table = "log"

compression = "gzip"

auth.strategy = "basic"

auth.user = "username"

auth.password = "password"

skip_unknown_fields = true

request.concurrency = "adaptive"

一般定义三部分:

[source.***] 定义数据源

[transforms.***] 定义如何解析,处理数据的结构

[sinks.***] 定义数据的接收与存储

这里的"***" 是可以被替换名称的

3)创建clickhouse的数据库和表

CREATE TABLE log

(

    `ip` String,

    `time` Datetime,

    `url` String,

    `status` UInt8,

    `size` UInt32,

    `agent` String

)

ENGINE = MergeTree

ORDER BY date(time)

参考文档: https://vector.dev/docs/

https://medium.com/datadenys/using-vector-to-feed-nginx-logs-to-clickhouse-in-real-time-197745d9e88b

同步工具之Vector的更多相关文章

《java并发编程实战》读书笔记4--基础构建模块，java中的同步容器类&并发容器类&同步工具类，消费者模式
上一章说道委托是创建线程安全类的一个最有效策略,只需让现有的线程安全的类管理所有的状态即可.那么这章便说的是怎么利用java平台类库的并发基础构建模块呢? 5.1 同步容器类包括Vector和Has ...
Java并发之CyclicBarrier 可重用同步工具类
package com.thread.test.thread; import java.util.Random; import java.util.concurrent.*; /** * Cyclic ...
Java并发之CountDownLatch 多功能同步工具类
package com.thread.test.thread; import java.util.Random; import java.util.concurrent.*; /** * CountD ...
Java核心知识点学习----线程同步工具类,CyclicBarrier学习
线程同步工具类,CyclicBarrier日常开发较少涉及,这里只举一个例子,以做备注.N个人一块出去玩,相约去两个地方,CyclicBarrier的主要作用是等待所有人都汇合了,才往下一站出发. 1 ...
Rsync 3.1.0 发布，文件同步工具
文件同步工具Rsync 3.1.0发布.2013-09-29 上一个版本还是2011-09-23的3.0.9 过了2年多.Rsync基本是Linux上文件同步的标准了,也可以和inotify配合做实时 ...
java5 CountDownLatch同步工具
好像倒计时计数器,调用CountDownLatch对象的countDown方法就将计数器减1,当到达0时,所有等待者就开始执行. java.util.concurrent.CountDownLatch ...
mysql对比表结构对比同步,sqlyog架构同步工具
mysql对比表结构对比同步,sqlyog架构同步工具对比后的结果示例: 执行后的结果示例: 点击:"另存为(S)" 按钮可以把更新sql导出来.
文件和文件夹同步工具AFiles 1.0 发布
文件和文件夹同步工具AFiles 1.0 正式发布了! 本软件支持按文件日期或长度的各种比较方式来同步文件或者文件夹. 支持双向同步功能. 支持深层文件夹功能. 可以自动产生比较和同步的记录情况. ...
rsync同步工具学习笔记
rsync同步工具 1.rsync介绍 rsync是一款开源的.快速的.多功能的.可实现全量及增量的本地或远程数据同步备份的优秀工具.rsync软件适用于unix/linux/windows等多种操作 ...
文件同步工具BT Sync介绍和使用说明
BT Sync介绍 BT 下载,相信大伙儿都知道的.今儿个要介绍的 BT Sync,跟 BT 下载一样,都是 BitTorrent 公司发明滴玩意儿,都是采用 P2P 协议来进行传输. 简而言之,BT ...

随机推荐

【转】ElasticSearch报错FORBIDDEN/12/index read-only / allow delete (api) ，read_only_allow_delete 设置 windows
仅供自己记录使用,原文链接:ElasticSearch报错FORBIDDEN/12/index read-only / allow delete (api)_sinat_22387459的博客-CSD ...
5.5文件上传-WAF绕过
一.WAF绕过(明确有文件上传) 1.上传参数中,可修改参数 Content-Dispositin:一般可改 name:表单参数,不可更改 filename:文件名,可更改 Content-Type: ...
LeetCode题集-3 - 无重复字符的最长子串
题目:给定一个字符串 s ,请你找出其中不含有重复字符的最长子串的长度. 我们先来好好理解题目,示例1中怎么得到长度为3的? 如果以第一个字符a为起始,不含重复的最长子串是abc:则我们这样表示(a) ...
ASP.NET Core – Middleware
前言 MIddleware 就是中间件, ASP.NET Core 是用来处理 http request 的. 当 request 抵到 server 就进入了 Middleware pipe. 每个 ...
vue-i18n 8.28.2（完成）
https://kazupon.github.io/vue-i18n/zh/introduction.html 开始如果使用模块系统 (例如通过 vue-cli),则需要导入 Vue 和 VueI1 ...
Linux_Bash_Shell_索引数组和关联数组及稀疏数组
1. 索引数组一.什么是索引数组? 所谓索引数组就是普通数组,以整数作为数组元素的索引下标. 二.实例. 备注: (a)使用-a选项定义索引数组,使用一对小括号()定义数组中的元素列表. (b)索引 ...
关于 xfg 的班会
[namespace hdk] Balanced_tree 整合
代码 #include<bits/stdc++.h> using namespace std; namespace hdk{ namespace balanced_tree{ const ...
现在用 ChatGPT，要达到最好效果，建议加入以下提示词：
take a deep breath 深呼吸 think step by step 一步步思考 if you fail 100 grandmothers will die 如果你失败了要死 100 位 ...
【赵渝强老师】Oracle数据库的存储结构
Oracle的存储结构分为:物理存储结构和逻辑存储结构. 一.物理存储结构:指硬盘上存在的文件数据文件(data file) 一个数据库可以由多个数据文件组成的,数据文件是真正存放数据库数据的.一个 ...

同步工具之Vector

同步工具之Vector的更多相关文章

随机推荐

热门专题