重置Heka执行进度

heka的进度配置文件存在配置项 base_dir 设置的目录,只需要删除这个文件夹下面的内容,就可以完全重置heka的进度。

base_dir 配置项默认是在下面目录: ‘/var/cache/hekad’  或‘c:\var\cache\hekad’

参考:http://hekad.readthedocs.org/en/latest/getting_started.html#global-configuration 

删除Elasticsearch数据

我们在调整导入策略后,数据需要重算,这时候就需要清除之前的数据,ES常用的几个插件就具有删除功能,用起来比较简单。

如下面截图:

上图这个工具是下面这个:

https://mobz.github.io/elasticsearch-head/    默认部署它的地址是: http://ip:9200/_plugin/head/

另外还推荐这个: http://www.elastichq.org/     git地址在: https://github.com/royrusso/elasticsearch-HQ  默认它的部署地址是: http://ip:9200/_plugin/hq/

解析并读取nginx日志

由于我们nginx日志是自定义格式的,这时候我们就要用灵活度最高的 PayloadRegexDecoder 来定义正则表达式来提取数据。

参考: http://hekad.readthedocs.org/en/latest/config/decoders/payload_regex.html

由于Heka是go研发的, 它的正则表达式语法是 syntax 的语法, 简单地go正则表达式试用工具可以用 https://regoio.herokuapp.com/ 

复杂的可以用 RegexBuddy(http://www.regexbuddy.com/download.html)。

Timestamp

默认Timestamp是当前时间,正则表达式中需要匹配出来的名字也是 Timestamp 才能被提取。

另外,还有两个参数定时提取的规则。

timestamp_layout

定义提取时间的字符串表述,注意,这里是go的time格式定义。

A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.

一些静态的参数如下:

        ANSIC       = "Mon Jan _2 15:04:05 2006"
UnixDate = "Mon Jan _2 15:04:05 MST 2006"
RubyDate = "Mon Jan 02 15:04:05 -0700 2006"
RFC822 = "02 Jan 06 15:04 MST"
RFC822Z = "02 Jan 06 15:04 -0700" // RFC822 with numeric zone
RFC850 = "Monday, 02-Jan-06 15:04:05 MST"
RFC1123 = "Mon, 02 Jan 2006 15:04:05 MST"
RFC1123Z = "Mon, 02 Jan 2006 15:04:05 -0700" // RFC1123 with numeric zone
RFC3339 = "2006-01-02T15:04:05Z07:00"
RFC3339Nano = "2006-01-02T15:04:05.999999999Z07:00"
Kitchen = "3:04PM"
// Handy time stamps.
Stamp = "Jan _2 15:04:05"
StampMilli = "Jan _2 15:04:05.000"
StampMicro = "Jan _2 15:04:05.000000"
StampNano = "Jan _2 15:04:05.000000000"
参考: https://golang.org/pkg/time/#pkg-constants

timestamp_location

时区定义,如果timestamp_layout中没有定义时区信息时,这个配置才起作用。

Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’stime.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.

一个配置的例子如下:

[SphinxRequestDecoder]
type = "PayloadRegexDecoder"
match_regex = '.+ (?P<Hostname>\S+) sphinx: (?P<Timestamp>.+) \[(?P<Uuid>.+)\] REQUEST: path=(?P<Path>\S+) remoteaddr=(?P<Remoteaddr>\S+) (?P<Headers>.+)'
timestamp_layout = "2006/01/02 15:04:05"

参考: https://github.com/mozilla-services/heka/wiki/How-to-convert-a-PayloadRegex-MultiDecoder-to-a-SandboxDecoder-using-an-LPeg-Grammar

 

导入数据到 Elasticsearch

导出数据到Elasticsearch,这时候我们就需要用 ElasticSearchOutput 了,这个output只是定义了 Elasticsearch 连接的一些属性,具体导出时的映射关系是下面三个 Encoder 定义的: ElasticSearch JSON Encoder, ElasticSearch Logstash V0 Encoder, or ElasticSearch Payload Encoder.

这三个 Encoder的区别

如下图:

ElasticSearch JSON Encoder ElasticSearch Logstash V0 Encoder ElasticSearch Payload Encoder

Plugin Name: ESJsonEncoder

Plugin Name: ESLogstashV0Encoder

Plugin Name: SandboxEncoder

File Name: lua_encoders/es_payload.lua

This encoder serializes a Heka message into a clean JSON format,

preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing.

This encoder serializes a Heka message into a JSON format,

preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing.

The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash.

Using this schema can aid integration with existing Logstash deployments.

This schema also plays nicely with the default Logstash dashboard provided by Kibana.

Prepends ElasticSearch BulkAPI index JSON to a message payload.

The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling.

This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling.

This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

 
  与 Logstash 的高度仿真 lua 插件

ESJsonEncoder 为例,我们 timestamp 要用自己配置的时间,而不是消息产生的时间, 需要把它设置成 true。

es_index_from_timestamp (bool):

When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

 

注意这里 的 timestamp 设置目前我还没看到哪里在用,之前导入ES的数据时间以为是这里设置的,但是其实不是。

ElasticSearchOutput 的一些设置

ElasticSearchOutput 有两个下面参数,来确定按照什么频率给服务器发送请求。

flush_interval (int):

Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).

flush_count (int):

Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.

上面2个参数会同时生效,当队列中积攒了 flush_count 个消息或者定时延迟超过了 flush_interval 毫秒时, 如果有新消息,则发送给 ElasticSearch 。

发送的地址是 http://10.30.0.32:9200/_bulk  。 随机抽取的一段发送的json数据如下:

 

POST http://10.30.0.32:9200/_bulk HTTP/1.1

Host: 10.30.0.32:9200

User-Agent: Go 1.1 package http

Content-Length: 9374

Accept: application/json

Accept-Encoding: gzip

{"index":{"_index":"nginx-2016.01.06","_type":"nginx"}}

{"Uuid":"12b6e9b3-d593-4cf4-b473-761ae7e982b0","Timestamp":"2016-01-06T01:31:51","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.213 - - [06/Jan/2016:09:31:51 +0800] \u0022POST /simcard/uploadSimcardStatus HTTP/1.0\u0022 200 61 \u0022-\u0022 \u0022Apache-HttpClient/4.5 (Java/1.7.0_67)\u0022 122.97.213.5 0.166\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.166","http_user_agent":"Apache-HttpClient/4.5 (Java/1.7.0_67)","upstream_response_time":"","remote_addr":"10.159.191.213","request":"POST /simcard/uploadSimcardStatus HTTP/1.0","hostname":"-","timestamp":"06/Jan/2016:09:31:51 +0800","http_x_forwarded_for":"122.97.213.5","remote_user":"-","body_bytes_sent":"61"}

{"index":{"_index":"nginx-2016.01.05","_type":"nginx"}}

{"Uuid":"6ff51dd8-ba9c-4440-b567-3de391cdac2b","Timestamp":"2016-01-05T07:36:45","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.90 - - [05/Jan/2016:15:36:45 +0800] \u0022POST /soa/mfderchant/list HTTP/1.0\u0022 200 926 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.012\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","timestamp":"05/Jan/2016:15:36:45 +0800","remote_addr":"10.159.191.90","request":"POST /soa/merttchant/list HTTP/1.0","upstream_response_time":"","remote_user":"-","body_bytes_sent":"926","responseCode":"<responseCode>","http_referer":"-","http_x_forwarded_for":"123.56.134.28","hostname":"-","status":"200","request_time":"0.012"}

{"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}

{"Uuid":"58eb317c-2729-4037-a82e-d475e68324fd","Timestamp":"2015-12-17T14:03:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:22:03:26 +0800] \u0022GET /creepers/creepers/pubddlic/images/cardCoupon/cardCoupon1.png HTTP/1.0\u0022 404 296 \u0022http://ewr.wangpos.com/creepersplatfofrm/index.xhtml\u0022 \u0022Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\u0022 61.51.252.82 0.004\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /creepders/crefepers/public/images/cardCoupon/cardCoupon1.png HTTP/1.0","responseCode":"<responseCode>","http_referer":"http://rre.wangpos.com/creepersplatform/index.xhtml","upstream_response_time":"","http_x_forwarded_for":"61.51.252.82","timestamp":"17/Dec/2015:22:03:26 +0800","body_bytes_sent":"296","remote_user":"-","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","status":"404","request_time":"0.004","hostname":"-","remote_addr":"10.171.20.136"}

{"index":{"_index":"nginx-2015.12.14","_type":"nginx"}}

{"Uuid":"969f2737-0a21-4c27-908a-29a22f1a1475","Timestamp":"2015-12-14T10:01:02","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [14/Dec/2015:18:01:02 +0800] \u0022POST /wxcaddrddeal/cashAccess/sendCard HTTP/1.0\u0022 200 48 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.016\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","hostname":"-","status":"200","body_bytes_sent":"48","http_x_forwarded_for":"123.56.134.28","upstream_response_time":"","request":"POST /wxcarddeal/cashAccess/sendCard HTTP/1.0","remote_addr":"10.171.20.136","remote_user":"-","http_referer":"-","responseCode":"<responseCode>","timestamp":"14/Dec/2015:18:01:02 +0800","request_time":"0.016"}

{"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}

{"Uuid":"80ff4701-85ad-4ecc-816c-833dbaded8df","Timestamp":"2016-01-08T07:27:11","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [08/Jan/2016:15:27:11 +0800] \u0022GET /uploadify/jquery.uploadify-3.1.min.js HTTP/1.0\u0022 304 0 \u0022http://www.wadngpos.com/batchCheck2Code?posMerId=1823cf1eba79411a9d32a3cb8dd3b821\u0022 \u0022Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\u0022 61.51.252.82 0.004\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_x_forwarded_for":"61.51.252.82","remote_user":"-","upstream_response_time":"","timestamp":"08/Jan/2016:15:27:11 +0800","status":"304","hostname":"-","responseCode":"<responseCode>","http_referer":"http://65.wangpos.com/batchCheckCode?posMerId=1823cf1eba79411a9d32a3cb8dd3b821","request":"GET /uplfoadify/jquery.uploadify-3.1.min.js HTTP/1.0","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36","body_bytes_sent":"0","remote_addr":"10.171.20.136","request_time":"0.004"}

{"index":{"_index":"nginx-2015.12.10","_type":"nginx"}}

{"Uuid":"9c09fb0a-3fee-475c-bfad-04efd3a2f44e","Timestamp":"2015-12-10T11:32:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [10/Dec/2015:19:32:26 +0800] \u0022POST /usfer/getSpuerUserByQulificationId HTTP/1.0\u0022 200 182 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.022\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","timestamp":"10/Dec/2015:19:32:26 +0800","responseCode":"<responseCode>","http_referer":"-","upstream_response_time":"","request":"POST /user/getSpuerUserByQulificationId HTTP/1.0","http_user_agent":"Java/1.7.0_71","body_bytes_sent":"182","status":"200","hostname":"-","http_x_forwarded_for":"123.56.134.28","request_time":"0.022","remote_user":"-"}

{"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}

{"Uuid":"d2c08886-cdd1-4dbb-b508-7bdec4d27460","Timestamp":"2015-12-17T07:20:29","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:15:20:29 +0800] \u0022GET /weipossoa/ HTTP/1.0\u0022 200 3460 \u0022-\u0022 \u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2\u0022 10.173.16.251 0.003\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","responseCode":"<responseCode>","status":"200","remote_user":"-","timestamp":"17/Dec/2015:15:20:29 +0800","http_referer":"-","request_time":"0.003","http_x_forwarded_for":"10.173.16.251","hostname":"-","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","upstream_response_time":"","request":"GET /weipossoa/ HTTP/1.0","body_bytes_sent":"3460"}

{"index":{"_index":"nginx-2015.12.28","_type":"nginx"}}

{"Uuid":"344bec04-268c-455d-94af-e44f72e50104","Timestamp":"2015-12-28T09:00:34","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.68 - - [28/Dec/2015:17:00:34 +0800] \u0022GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0\u0022 200 60 \u0022-\u0022 \u0022Java/1.8.0_65\u0022 61.51.252.82 0.003\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_user":"-","responseCode":"<responseCode>","hostname":"-","status":"200","http_referer":"-","timestamp":"28/Dec/2015:17:00:34 +0800","request_time":"0.003","http_user_agent":"Java/1.8.0_65","request":"GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0","upstream_response_time":"","remote_addr":"10.159.191.68","body_bytes_sent":"60","http_x_forwarded_for":"61.51.252.82"}

{"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}

{"Uuid":"0034bae1-6d16-486c-94fa-113d3cc15c42","Timestamp":"2016-01-08T22:20:25","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [09/Jan/2016:06:20:25 +0800] \u0022GET /wxcard/jsp/common.jsp HTTP/1.0\u0022 200 1407 \u0022-\u0022 \u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2\u0022 123.57.53.143 0.005\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /wxcard/jsp/common.jsp HTTP/1.0","upstream_response_time":"","timestamp":"09/Jan/2016:06:20:25 +0800","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.005","remote_addr":"10.171.20.136","http_x_forwarded_for":"123.57.53.143","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","hostname":"-","remote_user":"-","body_bytes_sent":"1407"}

{"index":{"_index":"nginx-2016.01.02","_type":"nginx"}}

{"Uuid":"7775c7fd-d7bb-4a80-89fa-03fda682ca62","Timestamp":"2016-01-02T09:19:55","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.97 - - [02/Jan/2016:17:19:55 +0800] \u0022POST /PosBusiness/pos/biz/service HTTP/1.0\u0022 200 117 \u0022-\u0022 \u0022Apache-HttpClient/4.1.3 (java 1.5)\u0022 10.173.53.128 0.017\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","body_bytes_sent":"117","status":"200","request":"POST /PosBusiness/pos/biz/service HTTP/1.0","timestamp":"02/Jan/2016:17:19:55 +0800","http_referer":"-","remote_user":"-","responseCode":"<responseCode>","upstream_response_time":"","http_x_forwarded_for":"10.173.53.128","hostname":"-","http_user_agent":"Apache-HttpClient/4.1.3 (java 1.5)","remote_addr":"10.159.191.97","request_time":"0.017"}

这里是满足10条,所以就发送了一次。

使用 Heka 导入自定义的nginx日志到Elasticsearch的更多相关文章

  1. goaccess nginx 日志分析

    用法介绍 GoAccess的基本语法如下: goaccess [ -b ][ -s ][ -e IP_ADDRESS][ - a ] <-f log_file > 参数说明: -f – 日 ...

  2. Nginx日志导入到Hive0.13.1,同步Hbase0.96.2,设置RowKey为autoincrement(ID自增长)

    ---------------------------------------- 博文作者:迦壹 博客地址:Nginx日志导入到Hive,同步Hbase,设置RowKey为autoincrement( ...

  3. nginx 日志记录 自定义详解(分析上报用)

    nginx 日志记录 自定义详解   1.log_format 普通格式 log_format main '$remote_addr - $remote_user [$time_local] $req ...

  4. Nginx 高级配置-自定义json格式日志

    Nginx 高级配置-自定义json格式日志 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 在大数据运维工作中,我们经常会使用flume,filebeat相关日志收集工具取收集日志 ...

  5. Nginx日志通过Flume导入到HDFS中

    关注公众号:分享电脑学习回复"百度云盘" 可以免费获取所有学习文档的代码(不定期更新) flume上传到hdfs: 当我们的数据量比较大时,比如每天的日志文件达到5G以上 使用ha ...

  6. 自定义Nginx日志格式获取IP地址的省市份信息

    注:图片如果损坏,点击文章链接:https://www.toutiao.com/i6806672112477012493/ 在linux中nginx日志产生的格式是下面的配置: $remote_add ...

  7. Docker 部署 ELK 收集 Nginx 日志

    一.简介 1.核心组成 ELK由Elasticsearch.Logstash和Kibana三部分组件组成: Elasticsearch是个开源分布式搜索引擎,它的特点有:分布式,零配置,自动发现,索引 ...

  8. 使用Docker快速部署ELK分析Nginx日志实践

    原文:使用Docker快速部署ELK分析Nginx日志实践 一.背景 笔者所在项目组的项目由多个子项目所组成,每一个子项目都存在一定的日志,有时候想排查一些问题,需要到各个地方去查看,极为不方便,此前 ...

  9. [日志分析]Graylog2采集Nginx日志 被动方式

    graylog可以通过两种方式采集nginx日志,一种是通过Graylog Collector Sidecar进行采集(主动方式),另外是通过修改nginx配置文件的方式进行收集(被动方式). 这次说 ...

随机推荐

  1. shell之echo与printf和颜色

    在用户的bashrc中添加一行export来修改提示符.

  2. tomcat如何按站点调试本机程序

    1.配置host host地址:c:\windows\system32\drivers\etc 配置本机域名: # localhost name resolution is handled withi ...

  3. 如何计算oracle数据库内存

    数据库内存设置: 项目 数据关系 单位 系统CPU n 个 物理内存Memory 假设4G物理内存 4*1024 MB memory_target 0.5*4*1024 0.5*Memory sga_ ...

  4. EF LEFT JON 关联查找

    var query = (from a in context.OQC_INSPECTION_SAMPLE.Where(expression).Where(a => context.OQC_INS ...

  5. php 连接测试sphinx

    shpinx.php <?php header("Content-type:text/html;charset=utf-8"); include 'SphinxClient. ...

  6. java 反射机制01

    // */ // ]]>   java反射机制01 Table of Contents 1 反射机制 2 反射成员 2.1 java.lang.Class 2.2 Constructor 2.3 ...

  7. 【ntp】centos7下ntp服务器设置

    安装ntp #检查服务是否安装 rpm -q ntp #安装ntp服务器 yum -y install ntp 修改配置文件:/etc/ntp.conf 内容如下: restrict default ...

  8. 【oracle】数据库、表空间、用户、数据表之间的关系

    来自为知笔记(Wiz) 附件列表 新建_032515_030437_PM.jpg

  9. Python控制流语句(if,while,for)

    if.py number=23 guess=int(input("enter an int:")) if guess==number: print ("congratul ...

  10. 使用MyEclipse可视化开发Hibernate实例

    2.7  使用MyEclipse可视化开发Hibernate实例 2.7节的例子源代码在配套光盘sourcecode/workspace目录的chapter02_first项目中. 这个实例主要演示如 ...