来自:http://caiguangguang.blog.51cto.com/1652935/1384187

flume bucketpath的bug一例

测试的配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
agent-server1.sources= testtail
agent-server1.sinks = hdfs-sink
agent-server1.channels= hdfs-channel
agent-server1.sources.testtail.type = netcat
agent-server1.sources.testtail.bind = localhost
agent-server1.sources.testtail.port = 9999
agent-server1.sinks.hdfs-sink.hdfs.kerberosPrincipal = hdfs/_HOST@KERBEROS_HADOOP
agent-server1.sinks.hdfs-sink.hdfs.kerberosKeytab = /home/vipshop/conf/hdfs.keytab
agent-server1.channels.hdfs-channel.type = memory
agent-server1.channels.hdfs-channel.capacity = 200000000
agent-server1.channels.hdfs-channel.transactionCapacity = 10000
agent-server1.sinks.hdfs-sink.type = hdfs
agent-server1.sinks.hdfs-sink.hdfs.path = hdfs://bipcluster/tmp/flume/%Y%m%d
agent-server1.sinks.hdfs-sink.hdfs.rollInterval = 60
agent-server1.sinks.hdfs-sink.hdfs.rollSize = 0
agent-server1.sinks.hdfs-sink.hdfs.rollCount = 0
agent-server1.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
agent-server1.sinks.hdfs-sink.hdfs.round = false
agent-server1.sinks.hdfs-sink.hdfs.roundValue = 30
agent-server1.sinks.hdfs-sink.hdfs.roundUnit = minute
agent-server1.sinks.hdfs-sink.hdfs.batchSize = 100
agent-server1.sinks.hdfs-sink.hdfs.fileType = DataStream
agent-server1.sinks.hdfs-sink.hdfs.writeFormat = Text
agent-server1.sinks.hdfs-sink.hdfs.callTimeout = 60000
agent-server1.sinks.hdfs-sink.hdfs.idleTimeout = 100
agent-server1.sinks.hdfs-sink.hdfs.filePrefix = ip
agent-server1.sinks.hdfs-sink.channel = hdfs-channel
agent-server1.sources.testtail.channels = hdfs-channel

在启动服务后,使用telnet进行测试,发现如下报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
14/03/24 18:03:07 ERROR hdfs.HDFSEventSink: process failed
java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing.
 Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NumberFormatException: null
        at java.lang.Long.parseLong(Long.java:375)
        at java.lang.Long.valueOf(Long.java:525)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
        ... 5 more
14/03/24 18:03:07 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to
resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        ... 3 more
Caused by: java.lang.NumberFormatException: null
        at java.lang.Long.parseLong(Long.java:375)
        at java.lang.Long.valueOf(Long.java:525)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
        ... 5 more

从调用栈的信息来看,错误出在org.apache.flume.formatter.output.BucketPath类的replaceShorthand方法。
在org.apache.flume.sink.hdfs.HDFSEventSink类中,使用process方法来生成hdfs的url,其中主要是调用了BucketPath类的escapeString方法来进行字符的转换,并最终调用了replaceShorthand方法。
其中replaceShorthand方法的相关代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  public static String replaceShorthand(char c, Map<String, String> headers,
      TimeZone timeZone, boolean needRounding, int unit, int roundDown) {
    String timestampHeader = headers.get("timestamp");
    long ts;
    try {
      ts = Long.valueOf(timestampHeader);
    catch (NumberFormatException e) {
      throw new RuntimeException("Flume wasn't able to parse timestamp header"
        " in the event to resolve time based bucketing. Please check that"
        " you're correctly populating timestamp header (for example using"
        " TimestampInterceptor source interceptor).", e);
    }
    if(needRounding){
      ts = roundDown(roundDown, unit, ts);
    }
........

从代码中可以看到,timestampHeader 的值如果取不到,在向ts赋值时就会报错。。
这其实是flume的一个bug,bug id:
https://issues.apache.org/jira/browse/FLUME-1419
解决方法有3个:
1.更改配置,更新hdfs文件的路径格式

1
agent-server1.sinks.hdfs-sink.hdfs.path = hdfs://bipcluster/tmp/flume

但是这样就不能按天来存放日志了
2.通过更改相关的代码
(patch:https://issues.apache.org/jira/secure/attachment/12538891/FLUME-1419.patch)
如果在headers中获取不到timestamp的值,就给它一个当前timestamp的值。
相关代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
     String timestampHeader = headers.get("timestamp");
     long ts;
     try {
      if (timestampHeader == null) {
        ts = System.currentTimeMillis();
      else {
        ts = Long.valueOf(timestampHeader);
      }
     } catch (NumberFormatException e) {
       throw new RuntimeException("Flume wasn't able to parse timestamp header"
         " in the event to resolve time based bucketing. Please check that"
         " you're correctly populating timestamp header (for example using"
                  " TimestampInterceptor source interceptor).", e);
}

3.为source定义基于timestamp的interceptors 
在配置中增加两行即可:

1
2
agent-server1.sources.testtail.interceptors = i1
agent-server1.sources.testtail.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

一个技巧:
在debug flume的问题时,可以在flume的启动参数中设置把debug日志打到console中。

1
-Dflume.root.logger=DEBUG,console,LOGFILE

Flume wasn't able to parse timestamp header的更多相关文章

  1. 关于flume中涉及到时间戳的错误解决,Expected timestamp in the Flume even

    在搭建flume集群收集日志写入hdfs时发生了下面的错误: java.lang.NullPointerException: Expected timestamp in the Flume event ...

  2. 当我new class的时候,提示以下错误: Unable to parse template "Class" Error message: This template did not produce a Java class or an interface Error parsing file template: Unable to find resource 'Package Header.j

    你肯定修改过class的template模板,改回去就好了 #if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")packag ...

  3. Flume架构

    Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统: Flume 介绍 Flume是由cloudera软件公司产出的高可用.高可靠.分布式的海量日志收集系 ...

  4. netty 网关 flume 提交数据 去除透明 批处理 批提交 cat head tail 结合 管道显示行号

    D:\javaNettyAction\NettyA\src\main\java\com\test\HexDumpProxy.java package com.test; import io.netty ...

  5. TimeStamp

    private void Form1_Load(object sender, EventArgs e) { textBox1.Text= GenerateTimeStamp(System.DateTi ...

  6. Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(一)

    Flume 1.7.0 User Guide Introduction(简介) Overview(综述) System Requirements(系统需求) Architecture(架构) Data ...

  7. c# datetime与 timeStamp(unix时间戳) 互相转换

    /// <summary> /// Unix时间戳转为C#格式时间 /// </summary> /// <param name="timeStamp" ...

  8. webMagic解析淘宝cookie 提示Invalid cookie header

    webMagic解析淘宝cookie 提示Invalid cookie header 在使用webMagic框架做爬虫爬取淘宝极又家页面时候一直提醒cookie设置不可用如下图 淘宝的验证特别严重,c ...

  9. c# datetime与 timeStamp时间戳 互相转换

    将时间格式转化为一个int类型 // ::26时间转完后为:1389675686数字 为什么使用时间戳? 关于Unix时间戳,大概是这个意思,从1970年0时0分0秒开始到现在的秒数.使用它来获得的是 ...

随机推荐

  1. mysql数据库cup飙升处理思路

    1.先top查看是那一个进程,哪个端口占用CPU多. 2.show processeslist查看是否由于大量并发,锁引起的负载问题. 3.否则,查看慢查询,找出执行时间长的sql:explain分析 ...

  2. mysql关联查询和联合查询

    一.内联方式 1.传统关联查询 "select * from students,transcript where students.sid=transcript.sid and transc ...

  3. SQL Structured Query Language(结构化查询语言) 数据库

    SQL是Structured Query Language(结构化查询语言)的缩写. SQL是专为数据库而建立的操作命令集,是一种功能齐全的数据库语言. 在使用它时,只需要发出“做什么”的命令,“怎么 ...

  4. 解决win8内置管理员无法激活此应用

    解决win8内置管理员无法激活此应用 方法/步骤   在运行中输入:“gpedit.msc”,就会启动组策略编辑器.   依次展开“计算机配置”里面的 “Windows设置” “安全设置” “本地策略 ...

  5. java多线程知识点汇总(一)多线程基础

    1.什么叫多线程程序? 答:一个进程至少有一个线程在运行,当一个进程中出现多个线程时,就称这个应用程序是多线程应用程序. java编写的程序都是多线程的,因为最少有俩线程,main主线程和gc线程. ...

  6. xml转换成map

    import java.io.IOException;import java.io.StringReader;import java.util.ArrayList;import java.util.H ...

  7. UIScrollView视差效果动画

    UIScrollView视差效果动画 效果 源码 https://github.com/YouXianMing/Animations // // ScrollImageViewController.m ...

  8. 分布式消息系统Kafka初步(一) (赞)

    终于可以写kafka的文章了,Mina的相关文章我已经做了索引,在我的博客中置顶了,大家可以方便的找到.从这一篇开始分布式消息系统的入门. 在我们大量使用分布式数据库.分布式计算集群的时候,是否会遇到 ...

  9. android之使用mvn构建创造项目步骤

    转自:http://blog.csdn.net/luhuajcdd/article/details/8132386 手动的创建自己的android application   1.用android t ...

  10. Android之计算两个时间的相差

    参数:   sdate = 2013-07-16 16:14:47 /** * 以友好的方式显示时间 * @param sdate * @return */ public static String ...