Fluentd 使用 multiline 解析器来处理多行日志
日志收集的时候多行日志一直是一个比较头疼的问题,开发人员并不愿意将日志以 JSON 的方式进行输出,那么就只能在收集日志的时候去重新对日志做下结构化了。
由于日志采集器的实现方式和标准不一样,所以具体如何处理多行日志不同的采集器也会不一样的,比如这里我们使用 Fluentd 来作为日志采集器,那么我们就可以使用 multiline 这个解析器来处理多行日志。
多行解析器使用 formatN 和 format_firstline 参数解析日志,format_firstline 用于检测多行日志的起始行。formatN,其中 N 的范围是 [1..20],是多行日志的 Regexp 格式列表。
测试数据
比如现在我们有如下所示的多行日志数据:
2022-06-20 19:32:07.264 DEBUG 7 [TID:bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125] --- [ scheduling-4] o.s.d.redis.core.RedisConnectionUtils : Closing Redis Connection.
2022-06-20 19:32:07.264 DEBUG 7 [TID:bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125] --- [ scheduling-4] io.lettuce.core.RedisChannelHandler : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2022-06-20 17:28:27.871 DEBUG 6 [TID:N/A] --- [ main] o.h.l.p.build.spi.LoadPlanTreePrinter : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)
- Returns
- EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)
- QuerySpaces
- EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)
- SQL table alias mapping - scheduledl0_
- alias suffix - 0_
- suffixed key columns - {id1_51_0_}
2022-06-20 19:32:47.062 DEBUG 7 [TID:N/A] --- [nection-cleaner] h.i.c.PoolingHttpClientConnectionManager : Closing connections idle longer than 60000 MILLISECONDS
首先创建一个 fluentd 目录,在下面创建用于保存 fluentd 的配置文件 etc 目录和保存日志的 logs 目录,将上面的测试日志保存在 logs/test.log 文件中
$ mkdir fluentd
$ cd fluentd
# 创建用于保存 fluentd 的配置文件 etc 目录和保存日志的 logs 目录
$ mkdir -p etc logs
常规解析
然后创建一个用于解析日志的 fluentd 配置文件 etcd/fluentd_basic.conf,内容如下所示:
<source>
@type tail
path /fluentd/logs/*.log
pos_file /fluentd/logs/test.log.pos
tag test.logs
read_from_head true
<parse>
@type regexp
expression /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/
</parse>
</source>
<match **>
@type stdout
</match>
然后我们使用 docker 镜像的方式来启动 fluentd 解析我们的日志:
$ docker run --rm -v $(pwd)/etc:/fluentd/etc -v $(pwd)/logs:/fluentd/logs fluent/fluentd:v1.14-1 -c /fluentd/etc/fluentd_basic.conf -v
fluentd -c /fluentd/etc/fluentd_basic.conf -v
2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: parsing config file is succeeded path="/fluentd/etc/fluentd_basic.conf"
2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: gem 'fluentd' version '1.14.3'
2022-06-20 12:31:17 +0000 [warn]: fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: using configuration file: <ROOT>
<source>
@type tail
path "/fluentd/logs/*.log"
pos_file "/fluentd/logs/test.log.pos"
tag "test.logs"
read_from_head true
<parse>
@type "regexp"
expression /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/
unmatched_lines
</parse>
</source>
<match **>
@type stdout
</match>
</ROOT>
2022-06-20 12:36:21 +0000 [info]: fluent/log.rb:330:info: starting fluentd-1.14.3 pid=10 ruby="2.7.5"
2022-06-20 12:36:21 +0000 [info]: fluent/log.rb:330:info: spawn command to main: cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluentd_basic.conf", "-v", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2022-06-20 12:36:22 +0000 [info]: fluent/log.rb:330:info: adding match pattern="**" type="stdout"
2022-06-20 12:36:22 +0000 [info]: fluent/log.rb:330:info: adding source type="tail"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2022-06-20 12:36:22 +0000 [info]: #0 fluent/log.rb:330:info: starting fluentd worker pid=19 ppid=10 worker=0
2022-06-20 12:36:22 +0000 [debug]: #0 fluent/log.rb:309:debug: tailing paths: target = /fluentd/logs/test.log | existing =
2022-06-20 12:36:22 +0000 [info]: #0 fluent/log.rb:330:info: following tail of /fluentd/logs/test.log
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - Returns"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - QuerySpaces"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - SQL table alias mapping - scheduledl0_"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - alias suffix - 0_"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: " - suffixed key columns - {id1_51_0_}"
2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: ""
2022-06-20 12:36:22.308970489 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":" scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils : Closing Redis Connection."}
2022-06-20 12:36:22.309013403 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":" scheduling-4","message":"io.lettuce.core.RedisChannelHandler : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]"}
2022-06-20 12:36:22.309025559 +0000 test.logs: {"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":" main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)"}
2022-06-20 12:36:22.309715537 +0000 test.logs: {"timestamp":"2022-06-20 19:32:47.062","level":"DEBUG","pid":"7","tid":"N/A","thread":"nection-cleaner","message":"h.i.c.PoolingHttpClientConnectionManager : Closing connections idle longer than 60000 MILLISECONDS"}
2022-06-20 12:36:22 +0000 [info]: #0 fluent/log.rb:330:info: fluentd worker is now running worker=0
2022-06-20 12:36:22.305753588 +0000 fluent.info: {"pid":19,"ppid":10,"worker":0,"message":"starting fluentd worker pid=19 ppid=10 worker=0"}
2022-06-20 12:36:22.308522121 +0000 fluent.debug: {"message":"tailing paths: target = /fluentd/logs/test.log | existing = "}
2022-06-20 12:36:22.308751095 +0000 fluent.info: {"message":"following tail of /fluentd/logs/test.log"}
2022-06-20 12:36:22.309047520 +0000 fluent.warn: {"message":"pattern not matched: \" - Returns\""}
2022-06-20 12:36:22.309180634 +0000 fluent.warn: {"message":"pattern not matched: \" - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)\""}
2022-06-20 12:36:22.309258667 +0000 fluent.warn: {"message":"pattern not matched: \" - QuerySpaces\""}
2022-06-20 12:36:22.309328608 +0000 fluent.warn: {"message":"pattern not matched: \" - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)\""}
2022-06-20 12:36:22.309401309 +0000 fluent.warn: {"message":"pattern not matched: \" - SQL table alias mapping - scheduledl0_\""}
2022-06-20 12:36:22.309468557 +0000 fluent.warn: {"message":"pattern not matched: \" - alias suffix - 0_\""}
2022-06-20 12:36:22.309563730 +0000 fluent.warn: {"message":"pattern not matched: \" - suffixed key columns - {id1_51_0_}\""}
2022-06-20 12:36:22.309723704 +0000 fluent.warn: {"message":"pattern not matched: \"\""}
2022-06-20 12:36:22.310086626 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}
从上面的解析结果可以看出,正则表达式有一部分没匹配,有一些可以正常解析,比如下面的日志就是前面的一行日志解析出来后的结果:
{"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":" scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils : Closing Redis Connection."}
而没有正常匹配的是多行日志,fluentd 会将每一个日志行当成独立的一行进行处理,这显然不符合我们的预期。
多行解析器
我们希望的是能将多行日志当成一行日志进行处理,这里就需要用到 multiline 这个解析器了,新建一个用于多行日志处理的配置文件 etc/fluentd_multline.conf,内容如下所示:
<source>
@type tail
path /fluentd/logs/*.log
pos_file /fluentd/logs/test.log.pos
tag test.logs
read_from_head true
<parse>
@type multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/
</parse>
</source>
<match **>
@type stdout
</match>
这里面我们使用 format_firstline /\d{4}-\d{1,2}-\d{1,2}/ 来匹配每一行日志的开头,format1 用来解析第一行日志,如果你还有更多数据需要匹配,则可以继续配置第二行 format2 的匹配规则等等,使用上面这个配置重新启动 fluentd:
docker run --rm -v $(pwd)/etc:/fluentd/etc -v $(pwd)/logs:/fluentd/logs fluent/fluentd:v1.14-1 -c /fluentd/etc/fluentd_multline.conf -v
fluentd -c /fluentd/etc/fluentd_multline.conf -v
2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: parsing config file is succeeded path="/fluentd/etc/fluentd_multline.conf"
2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: gem 'fluentd' version '1.14.3'
2022-06-20 12:41:58 +0000 [warn]: fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: using configuration file: <ROOT>
<source>
@type tail
path "/fluentd/logs/*.log"
pos_file "/fluentd/logs/test.log.pos"
tag "test.logs"
read_from_head true
<parse>
@type "multiline"
format_firstline "/\\d{4}-\\d{1,2}-\\d{1,2}/"
format1 /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/
unmatched_lines
</parse>
</source>
<match **>
@type stdout
</match>
</ROOT>
2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: starting fluentd-1.14.3 pid=9 ruby="2.7.5"
2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: spawn command to main: cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluentd_multline.conf", "-v", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2022-06-20 12:41:59 +0000 [info]: fluent/log.rb:330:info: adding match pattern="**" type="stdout"
2022-06-20 12:41:59 +0000 [info]: fluent/log.rb:330:info: adding source type="tail"
2022-06-20 12:41:59 +0000 [warn]: #0 fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2022-06-20 12:41:59 +0000 [info]: #0 fluent/log.rb:330:info: starting fluentd worker pid=18 ppid=9 worker=0
2022-06-20 12:41:59 +0000 [debug]: #0 fluent/log.rb:309:debug: tailing paths: target = /fluentd/logs/test.log | existing =
2022-06-20 12:41:59 +0000 [info]: #0 fluent/log.rb:330:info: following tail of /fluentd/logs/test.log
2022-06-20 12:41:59.201105512 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":" scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils : Closing Redis Connection."}
2022-06-20 12:41:59.201140475 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":" scheduling-4","message":"io.lettuce.core.RedisChannelHandler : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]"}
2022-06-20 12:41:59.201213082 +0000 test.logs: {"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":" main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)\n - Returns\n - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)\n - QuerySpaces\n - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)\n - SQL table alias mapping - scheduledl0_\n - alias suffix - 0_\n - suffixed key columns - {id1_51_0_}"}
2022-06-20 12:41:59 +0000 [info]: #0 fluent/log.rb:330:info: fluentd worker is now running worker=0
2022-06-20 12:41:59.199950788 +0000 fluent.info: {"pid":18,"ppid":9,"worker":0,"message":"starting fluentd worker pid=18 ppid=9 worker=0"}
2022-06-20 12:41:59.200662918 +0000 fluent.debug: {"message":"tailing paths: target = /fluentd/logs/test.log | existing = "}
2022-06-20 12:41:59.200844577 +0000 fluent.info: {"message":"following tail of /fluentd/logs/test.log"}
2022-06-20 12:41:59.201480874 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}
可以看到现在获取到的日志就正常了,前面的多行日志也按我们的预期解析成一行日志了:
{"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":" main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)\n - Returns\n - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)\n - QuerySpaces\n - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)\n - SQL table alias mapping - scheduledl0_\n - alias suffix - 0_\n - suffixed key columns - {id1_51_0_}"}
当然这整个过程并不复杂,唯一麻烦的地方需要我们去「编写正则表达式」去匹配日志,这可能才是难倒大部分人的一个问题吧
Fluentd 使用 multiline 解析器来处理多行日志的更多相关文章
- Boost学习之语法解析器--Spirit
Boost.Spirit能使我们轻松地编写出一个简单脚本的语法解析器,它巧妙利用了元编程并重载了大量的C++操作符使得我们能够在C++里直接使用类似EBNF的语法构造出一个完整的语法解析器(同时也把C ...
- 高性能JSON解析器及生成器RapidJSON
RapidJSON是腾讯公司开源的一个C++的高性能的JSON解析器及生成器,同时支持SAX/DOM风格的API. 直击现场 RapidJSON是腾讯公司开源的一个C++的高性能的JSON解析器及生成 ...
- c# 怎样能写个sql的解析器
c# 怎样能写个sql的解析器 本示例主要是讲明sql解析的原理,真实的源代码下查看 sql解析器源代码 详细示例DEMO 请查看demo代码 前言 阅读本文需要有一定正则表达式基础 正则表达式基础教 ...
- XML技术之DOM4J解析器
由于DOM技术的解析,存在很多缺陷,比如内存溢出,解析速度慢等问题,所以就出现了DOM4J解析技术,DOM4J技术的出现大大改进了DOM解析技术的缺陷. 使用DOM4J技术解析XML文件的步骤? pu ...
- AFN解析器里的坑
AFN框架是用来用来发送网络请求的,它的好处是可以自动给你解析JSON数据,还可以发送带参数的请求AFN框架还可以监测当前的网络状态,还支持HTTPS请求,分别对用的类为AFNetworkReacha ...
- SpringMVC视图解析器
SpringMVC视图解析器 前言 在前一篇博客中讲了SpringMVC的Controller控制器,在这篇博客中将接着介绍一下SpringMVC视 图解析器.当我们对SpringMVC控制的资源发起 ...
- XML技术之SAX解析器
1.解析XML文件有三种解析方法:DOM SAX DOM4J. 2.首先SAX解析技术只能读取XML文档中的数据信息,不能对其文档中的数据进行添加,删除,修改操作:这就是SAX解析技术的一个缺陷. 3 ...
- 学习SpringMVC——说说视图解析器
各位前排的,后排的,都不要走,咱趁热打铁,就这一股劲我们今天来说说spring mvc的视图解析器(不要抢,都有位子~~~) 相信大家在昨天那篇如何获取请求参数篇中都已经领略到了spring mvc注 ...
- SpringMVC入门案例及请求流程图(关于处理器或视图解析器或处理器映射器等的初步配置)
SpringMVC简介:SpringMVC也叫Spring Web mvc,属于表现层的框架.Spring MVC是Spring框架的一部分,是在Spring3.0后发布的 Spring结构图 Spr ...
随机推荐
- Kafka 部署完在服务器端可以访问,而在外部其它电脑访问不了
Kafka 部署完在服务器端可以访问,而在外部其它电脑访问不了 原因:config/server.properties的listeners和advertised.listeners 不配置的话默认的l ...
- 对 API 平台的再思考【eolink翻译】
API 是推动现代企业数字化转型的基础.它不但连接了内部应用程序.合作伙伴和客户,同时也快速持续地向市场提供了各种新产品.版本和功能. 但当下还是以集中式的 API 交付为主.一个企业的对外 API ...
- [BJDCTF2020]EasySearch-1
1.打开之后界面如下: 2.在首界面审查源代码.抓包未获取到有效信息,就开始进行目录扫描,获取到index.php.swp文件,结果如下: 3.访问index.php.swp文件获取源代码信息,结果如 ...
- 建立二叉树的二叉链表(严6.65)--------西工大noj
需要注意的点:在创建二叉树的函数中,如果len1==len2==0,一定要把(*T)置为NULL然后退出循环 #include <stdio.h> #include <stdlib. ...
- css基础06
精灵图就是只要导入一张照片(这张照片里面有很多很多的小图标和照片),然后通过background-position来移动位置,使网页显示出对应图片或者图标.一般都是负值. 下载然后导入项目里. 不同浏 ...
- 小白之Python基础(二)
一.字符串 1.字符串编码发展: 1)ASCII码: 一个字节去表示 (8个比特(bit)作为一个字节(byte),因此,一个字节能表示的最大的整数就是255(二进制11111111 = 十进制255 ...
- FutureTask源码深度剖析
FutureTask源码深度剖析 前言 在前面的文章自己动手写FutureTask当中我们已经仔细分析了FutureTask给我们提供的功能,并且深入分析了我们该如何实现它的功能,并且给出了使用Ree ...
- java中为什么只存在值传递(以传入自定义引用类型为例)
java中只有值传递 为什么这么说?两个例子: public class Student { int sage = 20; String sname = "云胡不归"; publi ...
- 元数据治理利器 - Apache Atlas
一.功能简介 Atlas 是一组可扩展的核心基础治理服务,使企业能够高效地满足其在 Hadoop 中的合规性要求,并允许与整个企业数据生态系统集成.Apache Atlas 为组织提供开放的元数据管理 ...
- 【java】学习路径17-StringBuffer、StringBuilder的使用与区别
本文讲解StringBuffer和StringBuilder的使用与区别. 1-- String String类型我们已经很熟悉了,String一旦被赋值,其在堆中的数据便无法修改. 平时我们的&qu ...