文章转载自：https://elasticstack.blog.csdn.net/article/details/111321105

我们通常的做法是使用 Elasticsearch 的 ingest node 或者 Logstash 来对数据进行清洗。这其中包括删除，添加，丰富，转换等等。但是针对每个 beats 来讲，它们也分别有自己的一组 processors 来可以帮我们处理数据。我们可以访问 Elastic 的官方网站来查看针对 filebeat 的所有 processors。也就是说，我们可以在配置 beats 的时候并同时配置相应的 processors 来对数据进行处理。每个 processor 能够修改经过它的事件。

如果你想了解 ingest pipeline 是如何清洗这些事件的，请阅读我之前的文章 “Elastic可观测性 - 运用 pipeline 使数据结构化”。在之前文章 “深入理解 Dissect ingest processor” 中，我讲述了 dissect ingest processor 的应用。在今天的文章中，我将使用同样的 beat processor 来说明如何对数据进行格式化。

使用 filebeat 来对数据进行处理

在今天的实验中，我们将使用如下是例子来进行。我们创建一个叫做 sample.log 的文件，其内容如下：

sample.log

"321 - App01 - WebServer is starting"

"321 - App01 - WebServer is up and running"

"321 - App01 - WebServer is scaling 2 pods"

"789 - App02 - Database is will be restarted in 5 minutes"

"789 - App02 - Database is up and running"

"789 - App02 - Database is refreshing tables"

由于 filebeat 是以换行符来识别每一行的数据的，所以我在文件的最后一行也加上了一个换行符以确保最后一行的数据能被导入。

我们创建一个叫做 filebeat_processors.yml 的 filebeat 配置文件：

filebeat_processors.yml

它的内容如下：

filebeat.inputs:

- type: log

  enabled: true

  paths:

    - /Users/liuxg/data/beatsprocessors/sample.log

processors:

 - drop_fields:

     fields: ["ecs", "agent", "log", "input", "host"]

 - dissect:

     tokenizer: '"%{pid|integer} - %{service.name} - %{service.status}"'

     field: "message"

     target_prefix: ""     

setup.template.enabled: false

setup.ilm.enabled: false

output.elasticsearch:

  hosts: ["localhost:9200"]

  index: "sample"

  bulk_max_size: 1000

请注意你需要依据自己 sample.log 的位置修改上面的 paths 中的路径。

在上面，我们使用了 drop_fields 以及 dissect 两个 processor。我们使用如下的命令来运行 filebeat:

./filebeat -e -c ~/data/beatsprocessors/filebeat_processors.yml

同样地，我们需要根据自己的配置文件路径修改上面的路径。

运行完上面的命令后，我们可以在 Kibana 中进行查询 sample 索引的内容：

GET sample/_search

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : {

      "value" : 6,

      "relation" : "eq"

    },

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "qrBscHYBpymojx8hDWuV",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:18:16.540Z",

          "message" : "\"321 - App01 - WebServer is starting\"",

          "service" : {

            "name" : "App01",

            "status" : "WebServer is starting"

          },

          "pid" : 321

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "q7BscHYBpymojx8hDWuV",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:18:16.541Z",

          "pid" : 321,

          "message" : "\"321 - App01 - WebServer is up and running\"",

          "service" : {

            "name" : "App01",

            "status" : "WebServer is up and running"

          }

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "rLBscHYBpymojx8hDWuV",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:18:16.541Z",

          "message" : "\"321 - App01 - WebServer is scaling 2 pods\"",

          "service" : {

            "name" : "App01",

            "status" : "WebServer is scaling 2 pods"

          },

          "pid" : 321

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "rbBscHYBpymojx8hDWuV",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:18:16.541Z",

          "message" : "\"789 - App02 - Database is will be restarted in 5 minutes\"",

          "pid" : 789,

          "service" : {

            "name" : "App02",

            "status" : "Database is will be restarted in 5 minutes"

          }

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "rrBscHYBpymojx8hDWuV",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:18:16.541Z",

          "service" : {

            "name" : "App02",

            "status" : "Database is up and running"

          },

          "pid" : 789,

          "message" : "\"789 - App02 - Database is up and running\""

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "r7BscHYBpymojx8hDWuV",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:18:16.541Z",

          "service" : {

            "status" : "Database is refreshing tables",

            "name" : "App02"

          },

          "message" : "\"789 - App02 - Database is refreshing tables\"",

          "pid" : 789

        }

      }

    ]

  }

}

显然，我们得到了一个结构化的索引。在上面，我们对 pid 还进行了从字符串到整型值的转换。

我们甚至可以重新对一个字段命名，比如：

filebeat_processors.yml

filebeat.inputs:

- type: log

  enabled: true

  paths:

    - /Users/liuxg/data/beatsprocessors/sample.log

processors:

 - drop_fields:

     fields: ["ecs", "agent", "log", "input", "host"]

 - dissect:

     tokenizer: '"%{pid|integer} - %{service.name} - %{service.status}"'

     field: "message"

     target_prefix: ""

 - rename:

     fields:

        - from: "pid"

          to: "PID"

     ignore_missing: false

     fail_on_error: true    

setup.template.enabled: false

setup.ilm.enabled: false

output.elasticsearch:

  hosts: ["localhost:9200"]

  index: "sample"

  bulk_max_size: 1000

重新运行上面的配置文件，我们发现：

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : {

      "value" : 6,

      "relation" : "eq"

    },

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "UrB5cHYBpymojx8h7oCK",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:33:26.114Z",

          "service" : {

            "status" : "WebServer is starting",

            "name" : "App01"

          },

          "message" : "\"321 - App01 - WebServer is starting\"",

          "PID" : 321

        }

      },

   ...

之前的 pid 已经转换为 PID 字段。

我们还可以通过脚本来实现对事件的处理，比如：

filebeat_processors.yml

filebeat.inputs:

- type: log

  enabled: true

  paths:

    - /Users/liuxg/data/beatsprocessors/sample.log

processors:

 - drop_fields:

     fields: ["ecs", "agent", "log", "input", "host"]

 - dissect:

     tokenizer: '"%{pid|integer} - %{service.name} - %{service.status}"'

     field: "message"

     target_prefix: ""

 - rename:

     fields:

        - from: "pid"

          to: "PID"

     ignore_missing: false

     fail_on_error: true

 - script:

     lang: javascript

     id: my_filter

     params:

        pid: 789

     source: >

       var params = {pid: 0};

       function register(scriptParams) {

          params = scriptParams;

       }

       function process(event) {

          if (event.Get("PID") == params.pid) {

              event.Cancel();

          }

       }        

setup.template.enabled: false

setup.ilm.enabled: false

output.elasticsearch:

  hosts: ["localhost:9200"]

  index: "sample"

  bulk_max_size: 1000

在上面，当 PID 的值为 789 时，我们将过滤这个事件。重新运行 filebeat:

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : {

      "value" : 3,

      "relation" : "eq"

    },

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "5bCBcHYBpymojx8hrIup",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:41:53.478Z",

          "PID" : 321,

          "service" : {

            "status" : "WebServer is starting",

            "name" : "App01"

          },

          "message" : "\"321 - App01 - WebServer is starting\""

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "5rCBcHYBpymojx8hrIup",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:41:53.479Z",

          "message" : "\"321 - App01 - WebServer is up and running\"",

          "service" : {

            "status" : "WebServer is up and running",

            "name" : "App01"

          },

          "PID" : 321

        }

      },

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "57CBcHYBpymojx8hrIup",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T11:41:53.479Z",

          "service" : {

            "status" : "WebServer is scaling 2 pods",

            "name" : "App01"

          },

          "message" : "\"321 - App01 - WebServer is scaling 2 pods\"",

          "PID" : 321

        }

      }

    ]

  }

}

我们发现所有关于 PID 为789 的事件都被过滤掉了。

我们设置可以通过 script 的方法为事件添加一个 tag。当然由于这是一种 Javascript 的脚本编程，我们甚至可以依据一些条件对事件添加不同的 tag。

filebeat_processors.yml

filebeat.inputs:

- type: log

  enabled: true

  paths:

    - /Users/liuxg/data/beatsprocessors/sample.log

processors:

 - drop_fields:

     fields: ["ecs", "agent", "log", "input", "host"]

 - dissect:

     tokenizer: '"%{pid|integer} - %{service.name} - %{service.status}"'

     field: "message"

     target_prefix: ""

 - rename:

     fields:

        - from: "pid"

          to: "PID"

     ignore_missing: false

     fail_on_error: true

 - script:

     lang: javascript

     id: my_filter

     params:

        pid: 789

     source: >

       var params = {pid: 0};

       function register(scriptParams) {

          params = scriptParams;

       }

       function process(event) {

          if (event.Get("PID") == params.pid) {

              event.Cancel();

          }

          event.Tag("myevent")

       }

setup.template.enabled: false

setup.ilm.enabled: false

output.elasticsearch:

  hosts: ["localhost:9200"]

  index: "sample"

  bulk_max_size: 1000

在上面，我们添加了 event.Tag("myevent")。重新运行我们可以看到：

    "hits" : [

      {

        "_index" : "sample",

        "_type" : "_doc",

        "_id" : "C7CScHYBpymojx8hkKVy",

        "_score" : 1.0,

        "_source" : {

          "@timestamp" : "2020-12-17T12:00:20.365Z",

          "message" : "\"321 - App01 - WebServer is starting\"",

          "PID" : 321,

          "service" : {

            "name" : "App01",

            "status" : "WebServer is starting"

          },

          "tags" : [

            "myevent"

          ]

        }

      },

在上面，我们可以看到 tags 字段里有一个叫做 myevent 的值。

在今天的介绍中，我就当是抛砖引玉。更多关于 Filebeat 的 Beats processors，请参阅链接 https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html#processors

在今天的文章中，我们介绍了一种数据处理的方式。这种数据处理可以在 beats 中进行实现，而不需要在 Elasticsearch 中的 ingest node 中实现。在实际的使用中，你需要依据自己的架构设计来实现不同的设计方案。

Beats processors的更多相关文章

Beats: Filebeat和pipleline processors
简要来说: 使用filebeat读取log日志,在filebeat.yml中先一步处理日志中的个别数据,比如丢弃某些数据项,增加某些数据项. 按照之前的文档,是在filebeat.yml中操作的,具体 ...
Beats：运用 Filebeat 来对微服务 API 进行分析
文章转载自:https://elasticstack.blog.csdn.net/article/details/118145104 需要学习的是httpjson请求的写法使用 Filebeat 的 ...
Beats：使用 Elastic Stack 记录 Python 应用日志
文章转载自:https://elasticstack.blog.csdn.net/article/details/112259500 日志记录实际上是每个应用程序都必须具备的功能.无论你选择基于哪种技 ...
Beats：在 Beats 中实现动态 pipeline
转载自:https://blog.csdn.net/UbuntuTouch/article/details/107127197 在我们今天的练习中,我们将使用 Metricbeat 来同时监控 kib ...
lib/sqlalchemy/cextension/processors.c:10:20: 致命错误： Python.h：没有那个文件或目录
本文地址:http://www.cnblogs.com/yhLinux/p/4063444.html $ sudo easy_install sqlalchemy [sudo] password fo ...
Beats数据采集---Packetbeat\Filebeat\Topbeat\WinlogBeat使用指南
Beats是elastic公司的一款轻量级数据采集产品,它包含了几个子产品: packetbeat(用于监控网络流量). filebeat(用于监听日志数据,可以替代logstash-input-fi ...
BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Conferencing Processors
BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Confer ...
regardless of how many processors are devoted to a parallelized execution of this program
https://en.wikipedia.org/wiki/Amdah's_law Amdahl's law is often used in parallel computing to predic ...
ELK beats通用配置说明(12th)
Beats配置文件是以YAML语法,该文件包含用于所有的beats的通用配置选项,以及其特点的选项.下面说说通用的配置,特定的配置要看各自beat文档. 通用的配置如下几部分: Shipper Out ...

随机推荐

微信小程序接口请求/form-data/单文件、多文件上传
1.普通的微信请求封装 1 const http = (options) =>{ 2 return new Promise((resolve,reject) => { 3 wx.reque ...
C++对象的应用
本篇文章将介绍对象数组,对象的动态分配以及对象在函数中的应用. 一.对象数组 1.对象数组的定义和初始化定义对象数组与定义普通数组的语法形式基本相同.如定义一个Square obj[3]:表示一个正 ...
苹果手机和Windows之间互传文件
参考链接:https://jingyan.baidu.com/article/a378c960c46804f229283064.html 实现原理:就是使用Samba服务,windows共享一个文件夹 ...
【机器学习基础】——另一个视角解释SVM
SVM的另一种解释前面已经较为详细地对SVM进行了推导,前面有提到SVM可以利用梯度下降来进行求解,但并未进行详细的解释,本节主要从另一个视角对SVM进行解释,首先先回顾之前有关SVM的有关内容,然 ...
C#静态类、静态成员、静态方法
一.作用静态类和非静态类重要的区别是在于静态类不能被实例化,也就是说不能使用 new 关键字创建静态类类型的变量,防止程序员写代码来实例化该静态类或者在类的内部声明任何实例字段或方法. 用于存放不 ...
Mysql8基础知识
系统表都变为InnoDb表从MySQL 8.0开始,系统表全部换成事务型的InnoDB表,默认的MySQL实例将不包含任何MyISAM表,除非手动创建MyISAM表基本操作创建数据表的语句为CR ...
在oracle中创建管理员密码
1.因为在安装Oracle11g时没有设置sys和system用户的密码,导致登陆不上SQLplus,后面用sqlplus / as sysdba ,密码为:root登陆上去创建了密码. 2.如下图
webSocket的基本使用与socket.io库使用
前言: 传统的客户端与服务器进行通信,都是客户端向服务端发送请求,服务端进行响应,否则一般不会自动进行响应.单向,如果要持续获取服务端资源,则需要持续发送请求初解决方案:轮询:客户端让http请求保 ...
ROS机械臂 Movelt 学习笔记5 | MoveIt Commander Scripting
前一讲python接口中提到moveit_commander 包.这个包提供了用于运动规划.笛卡尔路径计算以及拾取和放置的接口. moveit_commander 包还包括一个命令行接口程序movei ...
3052 [USACO12MAR]摩天大楼里的奶牛Cows in a Skyscraper （状压DP，IDA*）
状压DP: #include <iostream> #include <cstdio> #include <cstring> #include <algori ...

Beats processors

使用 filebeat 来对数据进行处理

Beats processors的更多相关文章

随机推荐

热门专题