使用 Heka 导入自定义的nginx日志到Elasticsearch
重置Heka执行进度
heka的进度配置文件存在配置项 base_dir 设置的目录,只需要删除这个文件夹下面的内容,就可以完全重置heka的进度。
base_dir 配置项默认是在下面目录: ‘/var/cache/hekad’ 或‘c:\var\cache\hekad’
参考:http://hekad.readthedocs.org/en/latest/getting_started.html#global-configuration
删除Elasticsearch数据
我们在调整导入策略后,数据需要重算,这时候就需要清除之前的数据,ES常用的几个插件就具有删除功能,用起来比较简单。
如下面截图:
上图这个工具是下面这个:
https://mobz.github.io/elasticsearch-head/ 默认部署它的地址是: http://ip:9200/_plugin/head/
另外还推荐这个: http://www.elastichq.org/ git地址在: https://github.com/royrusso/elasticsearch-HQ 默认它的部署地址是: http://ip:9200/_plugin/hq/
解析并读取nginx日志
由于我们nginx日志是自定义格式的,这时候我们就要用灵活度最高的 PayloadRegexDecoder 来定义正则表达式来提取数据。
参考: http://hekad.readthedocs.org/en/latest/config/decoders/payload_regex.html
由于Heka是go研发的, 它的正则表达式语法是 syntax 的语法, 简单地go正则表达式试用工具可以用 https://regoio.herokuapp.com/
复杂的可以用 RegexBuddy(http://www.regexbuddy.com/download.html)。
Timestamp
默认Timestamp是当前时间,正则表达式中需要匹配出来的名字也是 Timestamp 才能被提取。
另外,还有两个参数定时提取的规则。
timestamp_layout
定义提取时间的字符串表述,注意,这里是go的time格式定义。
-
A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.
一些静态的参数如下:
ANSIC = "Mon Jan _2 15:04:05 2006"
UnixDate = "Mon Jan _2 15:04:05 MST 2006"
RubyDate = "Mon Jan 02 15:04:05 -0700 2006"
RFC822 = "02 Jan 06 15:04 MST"
RFC822Z = "02 Jan 06 15:04 -0700" // RFC822 with numeric zone
RFC850 = "Monday, 02-Jan-06 15:04:05 MST"
RFC1123 = "Mon, 02 Jan 2006 15:04:05 MST"
RFC1123Z = "Mon, 02 Jan 2006 15:04:05 -0700" // RFC1123 with numeric zone
RFC3339 = "2006-01-02T15:04:05Z07:00"
RFC3339Nano = "2006-01-02T15:04:05.999999999Z07:00"
Kitchen = "3:04PM"
// Handy time stamps.
Stamp = "Jan _2 15:04:05"
StampMilli = "Jan _2 15:04:05.000"
StampMicro = "Jan _2 15:04:05.000000"
StampNano = "Jan _2 15:04:05.000000000"
参考: https://golang.org/pkg/time/#pkg-constants
timestamp_location
时区定义,如果timestamp_layout中没有定义时区信息时,这个配置才起作用。
Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’stime.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.
一个配置的例子如下:
[SphinxRequestDecoder]
type = "PayloadRegexDecoder"
match_regex = '.+ (?P<Hostname>\S+) sphinx: (?P<Timestamp>.+) \[(?P<Uuid>.+)\] REQUEST: path=(?P<Path>\S+) remoteaddr=(?P<Remoteaddr>\S+) (?P<Headers>.+)'
timestamp_layout = "2006/01/02 15:04:05"
导入数据到 Elasticsearch
导出数据到Elasticsearch,这时候我们就需要用 ElasticSearchOutput 了,这个output只是定义了 Elasticsearch 连接的一些属性,具体导出时的映射关系是下面三个 Encoder 定义的: ElasticSearch JSON Encoder, ElasticSearch Logstash V0 Encoder, or ElasticSearch Payload Encoder.
这三个 Encoder的区别
如下图:
ElasticSearch JSON Encoder | ElasticSearch Logstash V0 Encoder | ElasticSearch Payload Encoder |
Plugin Name: ESJsonEncoder |
Plugin Name: ESLogstashV0Encoder |
Plugin Name: SandboxEncoder |
This encoder serializes a Heka message into a clean JSON format, preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing. |
This encoder serializes a Heka message into a JSON format, The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash. Using this schema can aid integration with existing Logstash deployments. This schema also plays nicely with the default Logstash dashboard provided by Kibana. |
Prepends ElasticSearch BulkAPI index JSON to a message payload. |
The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD. |
The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling. This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD. |
|
与 Logstash 的高度仿真 | lua 插件 |
以 ESJsonEncoder 为例,我们 timestamp 要用自己配置的时间,而不是消息产生的时间, 需要把它设置成 true。
es_index_from_timestamp (bool):
When generating the index name use the timestamp from the message instead of the current time. Defaults to false.
注意这里 的 timestamp 设置目前我还没看到哪里在用,之前导入ES的数据时间以为是这里设置的,但是其实不是。
ElasticSearchOutput 的一些设置
ElasticSearchOutput 有两个下面参数,来确定按照什么频率给服务器发送请求。
flush_interval (int):
Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).
flush_count (int):
Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.
上面2个参数会同时生效,当队列中积攒了 flush_count 个消息或者定时延迟超过了 flush_interval 毫秒时, 如果有新消息,则发送给 ElasticSearch 。
发送的地址是 http://10.30.0.32:9200/_bulk 。 随机抽取的一段发送的json数据如下:
POST http://10.30.0.32:9200/_bulk HTTP/1.1
Host: 10.30.0.32:9200
User-Agent: Go 1.1 package http
Content-Length: 9374
Accept: application/json
Accept-Encoding: gzip
{"index":{"_index":"nginx-2016.01.06","_type":"nginx"}}
{"Uuid":"12b6e9b3-d593-4cf4-b473-761ae7e982b0","Timestamp":"2016-01-06T01:31:51","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.213 - - [06/Jan/2016:09:31:51 +0800] \u0022POST /simcard/uploadSimcardStatus HTTP/1.0\u0022 200 61 \u0022-\u0022 \u0022Apache-HttpClient/4.5 (Java/1.7.0_67)\u0022 122.97.213.5 0.166\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.166","http_user_agent":"Apache-HttpClient/4.5 (Java/1.7.0_67)","upstream_response_time":"","remote_addr":"10.159.191.213","request":"POST /simcard/uploadSimcardStatus HTTP/1.0","hostname":"-","timestamp":"06/Jan/2016:09:31:51 +0800","http_x_forwarded_for":"122.97.213.5","remote_user":"-","body_bytes_sent":"61"}
{"index":{"_index":"nginx-2016.01.05","_type":"nginx"}}
{"Uuid":"6ff51dd8-ba9c-4440-b567-3de391cdac2b","Timestamp":"2016-01-05T07:36:45","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.90 - - [05/Jan/2016:15:36:45 +0800] \u0022POST /soa/mfderchant/list HTTP/1.0\u0022 200 926 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.012\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","timestamp":"05/Jan/2016:15:36:45 +0800","remote_addr":"10.159.191.90","request":"POST /soa/merttchant/list HTTP/1.0","upstream_response_time":"","remote_user":"-","body_bytes_sent":"926","responseCode":"<responseCode>","http_referer":"-","http_x_forwarded_for":"123.56.134.28","hostname":"-","status":"200","request_time":"0.012"}
{"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}
{"Uuid":"58eb317c-2729-4037-a82e-d475e68324fd","Timestamp":"2015-12-17T14:03:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:22:03:26 +0800] \u0022GET /creepers/creepers/pubddlic/images/cardCoupon/cardCoupon1.png HTTP/1.0\u0022 404 296 \u0022http://ewr.wangpos.com/creepersplatfofrm/index.xhtml\u0022 \u0022Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\u0022 61.51.252.82 0.004\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /creepders/crefepers/public/images/cardCoupon/cardCoupon1.png HTTP/1.0","responseCode":"<responseCode>","http_referer":"http://rre.wangpos.com/creepersplatform/index.xhtml","upstream_response_time":"","http_x_forwarded_for":"61.51.252.82","timestamp":"17/Dec/2015:22:03:26 +0800","body_bytes_sent":"296","remote_user":"-","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","status":"404","request_time":"0.004","hostname":"-","remote_addr":"10.171.20.136"}
{"index":{"_index":"nginx-2015.12.14","_type":"nginx"}}
{"Uuid":"969f2737-0a21-4c27-908a-29a22f1a1475","Timestamp":"2015-12-14T10:01:02","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [14/Dec/2015:18:01:02 +0800] \u0022POST /wxcaddrddeal/cashAccess/sendCard HTTP/1.0\u0022 200 48 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.016\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","hostname":"-","status":"200","body_bytes_sent":"48","http_x_forwarded_for":"123.56.134.28","upstream_response_time":"","request":"POST /wxcarddeal/cashAccess/sendCard HTTP/1.0","remote_addr":"10.171.20.136","remote_user":"-","http_referer":"-","responseCode":"<responseCode>","timestamp":"14/Dec/2015:18:01:02 +0800","request_time":"0.016"}
{"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}
{"Uuid":"80ff4701-85ad-4ecc-816c-833dbaded8df","Timestamp":"2016-01-08T07:27:11","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [08/Jan/2016:15:27:11 +0800] \u0022GET /uploadify/jquery.uploadify-3.1.min.js HTTP/1.0\u0022 304 0 \u0022http://www.wadngpos.com/batchCheck2Code?posMerId=1823cf1eba79411a9d32a3cb8dd3b821\u0022 \u0022Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\u0022 61.51.252.82 0.004\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_x_forwarded_for":"61.51.252.82","remote_user":"-","upstream_response_time":"","timestamp":"08/Jan/2016:15:27:11 +0800","status":"304","hostname":"-","responseCode":"<responseCode>","http_referer":"http://65.wangpos.com/batchCheckCode?posMerId=1823cf1eba79411a9d32a3cb8dd3b821","request":"GET /uplfoadify/jquery.uploadify-3.1.min.js HTTP/1.0","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36","body_bytes_sent":"0","remote_addr":"10.171.20.136","request_time":"0.004"}
{"index":{"_index":"nginx-2015.12.10","_type":"nginx"}}
{"Uuid":"9c09fb0a-3fee-475c-bfad-04efd3a2f44e","Timestamp":"2015-12-10T11:32:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [10/Dec/2015:19:32:26 +0800] \u0022POST /usfer/getSpuerUserByQulificationId HTTP/1.0\u0022 200 182 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.022\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","timestamp":"10/Dec/2015:19:32:26 +0800","responseCode":"<responseCode>","http_referer":"-","upstream_response_time":"","request":"POST /user/getSpuerUserByQulificationId HTTP/1.0","http_user_agent":"Java/1.7.0_71","body_bytes_sent":"182","status":"200","hostname":"-","http_x_forwarded_for":"123.56.134.28","request_time":"0.022","remote_user":"-"}
{"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}
{"Uuid":"d2c08886-cdd1-4dbb-b508-7bdec4d27460","Timestamp":"2015-12-17T07:20:29","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:15:20:29 +0800] \u0022GET /weipossoa/ HTTP/1.0\u0022 200 3460 \u0022-\u0022 \u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2\u0022 10.173.16.251 0.003\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","responseCode":"<responseCode>","status":"200","remote_user":"-","timestamp":"17/Dec/2015:15:20:29 +0800","http_referer":"-","request_time":"0.003","http_x_forwarded_for":"10.173.16.251","hostname":"-","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","upstream_response_time":"","request":"GET /weipossoa/ HTTP/1.0","body_bytes_sent":"3460"}
{"index":{"_index":"nginx-2015.12.28","_type":"nginx"}}
{"Uuid":"344bec04-268c-455d-94af-e44f72e50104","Timestamp":"2015-12-28T09:00:34","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.68 - - [28/Dec/2015:17:00:34 +0800] \u0022GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0\u0022 200 60 \u0022-\u0022 \u0022Java/1.8.0_65\u0022 61.51.252.82 0.003\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_user":"-","responseCode":"<responseCode>","hostname":"-","status":"200","http_referer":"-","timestamp":"28/Dec/2015:17:00:34 +0800","request_time":"0.003","http_user_agent":"Java/1.8.0_65","request":"GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0","upstream_response_time":"","remote_addr":"10.159.191.68","body_bytes_sent":"60","http_x_forwarded_for":"61.51.252.82"}
{"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}
{"Uuid":"0034bae1-6d16-486c-94fa-113d3cc15c42","Timestamp":"2016-01-08T22:20:25","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [09/Jan/2016:06:20:25 +0800] \u0022GET /wxcard/jsp/common.jsp HTTP/1.0\u0022 200 1407 \u0022-\u0022 \u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2\u0022 123.57.53.143 0.005\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /wxcard/jsp/common.jsp HTTP/1.0","upstream_response_time":"","timestamp":"09/Jan/2016:06:20:25 +0800","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.005","remote_addr":"10.171.20.136","http_x_forwarded_for":"123.57.53.143","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","hostname":"-","remote_user":"-","body_bytes_sent":"1407"}
{"index":{"_index":"nginx-2016.01.02","_type":"nginx"}}
{"Uuid":"7775c7fd-d7bb-4a80-89fa-03fda682ca62","Timestamp":"2016-01-02T09:19:55","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.97 - - [02/Jan/2016:17:19:55 +0800] \u0022POST /PosBusiness/pos/biz/service HTTP/1.0\u0022 200 117 \u0022-\u0022 \u0022Apache-HttpClient/4.1.3 (java 1.5)\u0022 10.173.53.128 0.017\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","body_bytes_sent":"117","status":"200","request":"POST /PosBusiness/pos/biz/service HTTP/1.0","timestamp":"02/Jan/2016:17:19:55 +0800","http_referer":"-","remote_user":"-","responseCode":"<responseCode>","upstream_response_time":"","http_x_forwarded_for":"10.173.53.128","hostname":"-","http_user_agent":"Apache-HttpClient/4.1.3 (java 1.5)","remote_addr":"10.159.191.97","request_time":"0.017"}
这里是满足10条,所以就发送了一次。
使用 Heka 导入自定义的nginx日志到Elasticsearch的更多相关文章
- goaccess nginx 日志分析
用法介绍 GoAccess的基本语法如下: goaccess [ -b ][ -s ][ -e IP_ADDRESS][ - a ] <-f log_file > 参数说明: -f – 日 ...
- Nginx日志导入到Hive0.13.1,同步Hbase0.96.2,设置RowKey为autoincrement(ID自增长)
---------------------------------------- 博文作者:迦壹 博客地址:Nginx日志导入到Hive,同步Hbase,设置RowKey为autoincrement( ...
- nginx 日志记录 自定义详解(分析上报用)
nginx 日志记录 自定义详解 1.log_format 普通格式 log_format main '$remote_addr - $remote_user [$time_local] $req ...
- Nginx 高级配置-自定义json格式日志
Nginx 高级配置-自定义json格式日志 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 在大数据运维工作中,我们经常会使用flume,filebeat相关日志收集工具取收集日志 ...
- Nginx日志通过Flume导入到HDFS中
关注公众号:分享电脑学习回复"百度云盘" 可以免费获取所有学习文档的代码(不定期更新) flume上传到hdfs: 当我们的数据量比较大时,比如每天的日志文件达到5G以上 使用ha ...
- 自定义Nginx日志格式获取IP地址的省市份信息
注:图片如果损坏,点击文章链接:https://www.toutiao.com/i6806672112477012493/ 在linux中nginx日志产生的格式是下面的配置: $remote_add ...
- Docker 部署 ELK 收集 Nginx 日志
一.简介 1.核心组成 ELK由Elasticsearch.Logstash和Kibana三部分组件组成: Elasticsearch是个开源分布式搜索引擎,它的特点有:分布式,零配置,自动发现,索引 ...
- 使用Docker快速部署ELK分析Nginx日志实践
原文:使用Docker快速部署ELK分析Nginx日志实践 一.背景 笔者所在项目组的项目由多个子项目所组成,每一个子项目都存在一定的日志,有时候想排查一些问题,需要到各个地方去查看,极为不方便,此前 ...
- [日志分析]Graylog2采集Nginx日志 被动方式
graylog可以通过两种方式采集nginx日志,一种是通过Graylog Collector Sidecar进行采集(主动方式),另外是通过修改nginx配置文件的方式进行收集(被动方式). 这次说 ...
随机推荐
- awk命令拷屏
如果不指明采取什么动作,awk默认打印出所有浏览出的记录,与{print $}是一样的 模式和动作两者是可选的,如果没有模式,则action应用到全部记录,如果没有action,则输出匹配全部记录. ...
- Jmeter使用
好久没有试过Jmeter了,下载个新版本试试,顺便温习一下. 1. 如何修改JMeter语言环境 在菜单栏中通过“选项”–“选择语言”选了英文后,下次登录JMeter,还是显示的中文,修改语言无效.关 ...
- ExtJs学习之Window
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <m ...
- javascript中in和hasOwnProperty区别
in操作符只要通过对象能访问到属性就返回true.hasOwnProperty()只在属性存在于实例中时才返回true. function Person(){ } Person.prototype.n ...
- oracle ORA-01747(系统保留关键字)user.table.column, table.column 或列说明无效 hibernate映射oracle保留关键字
1.查询系统关键 select * from v$reserved_words 确认你使用的是否为关键字: select * from v$reserved_words w where w.KEYWO ...
- ajax两种不同方式的不同结果
function upLoadAlterData(){ $("#form_main").ajaxSubmit({ url:"XX", cache:false, ...
- MongoDB shell 格式化
直接的方法: db.collection.find().pretty(); 如果想要所有的查询都格式化,可以执行: echo "DBQuery.prototype._prettyShell ...
- JS使用百度地图API
尚未整理: <script type="text/javascript"> var map = new BMap.Map("dituContent" ...
- python之pexpect模块
最近在看<Python自动化运维技术与最佳实战>这本书,学到了一个运维中用到的模块:pexpect 下面是其定义: Pexpect 是一个用来启动子程序并对其进行自动控制的 Python ...
- Python try/except异常处理机制
1. use try, except, finally try: data=open('its.txt','w') print('its..', file=data) except: print('f ...