flume 自己定义 hbase sink 类

參考（向原作者致敬）

http://ydt619.blog.51cto.com/316163/1230586
https://blogs.apache.org/flume/entry/streaming_data_into_apache_hbase

flume 1.5 的配置文件演示样例

#Name the  components on this agent

a1.sources  = r1

a1.sinks =  k1

a1.channels  = c1

#  Describe/configure the source

a1.sources.r1.type  = spooldir

a1.sources.r1.spoolDir  = /home/scut/Downloads/testFlume

# Describe  the sink

a1.sinks.k1.type  = org.apache.flume.sink.hbase.AsyncHBaseSink

a1.sinks.k1.table = Router #设置hbase的表名

a1.sinks.k1.columnFamily = log #设置hbase中的columnFamily

a1.sinks.k1.serializer.payloadColumn=serviceTime,browerOS,clientTime,screenHeight,screenWidth,url,userAgent,mobileDevice,gwId,mac # 设置hbase的column

a1.sinks.k1.serializer = org.apache.flume.sink.hbase.BaimiAsyncHbaseEventSerializer # 设置serializer的处理类

# Use a  channel which buffers events in memory

a1.channels.c1.type  = memory

a1.channels.c1.capacity  = 1000

a1.channels.c1.transactionCapacity  = 100

# Bind the  source and sink to the channel

a1.sources.r1.channels  = c1

a1.sinks.k1.channel  = c1

重点说明几个属性

a1.sinks.k1.serializer.payloadColumn 中列出了全部的列名。
a1.sinks.k1.serializer设置了flume serializer的处理类。BaimiAsyncHbaseEventSerializer类中会获取payloadColumn的内容。将它以逗号分隔。从而得出全部的列名。

BaimiAsyncHbaseEventSerializer类

/*

 * Licensed to the Apache Software Foundation (ASF) under one

 * or more contributor license agreements.  See the NOTICE file

 * distributed with this work for additional information

 * regarding copyright ownership.  The ASF licenses this file

 * to you under the Apache License, Version 2.0 (the

 * "License"); you may not use this file except in compliance

 * with the License.  You may obtain a copy of the License at

 *

 * http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing,

 * software distributed under the License is distributed on an

 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

 * KIND, either express or implied.  See the License for the

 * specific language governing permissions and limitations

 * under the License.

 */

package org.apache.flume.sink.hbase;

import java.util.ArrayList;

import java.util.List;

import org.apache.flume.Context;

import org.apache.flume.Event;

import org.apache.flume.FlumeException;

import org.hbase.async.AtomicIncrementRequest;

import org.hbase.async.PutRequest;

import org.apache.flume.conf.ComponentConfiguration;

import org.apache.flume.sink.hbase.SimpleHbaseEventSerializer.KeyType;

import com.google.common.base.Charsets;

public class BaimiAsyncHbaseEventSerializer implements AsyncHbaseEventSerializer {

  private byte[] table;

  private byte[] cf;

  private byte[][] payload;

  private byte[][] payloadColumn;

  private final String payloadColumnSplit = "\\^A";

  private byte[] incrementColumn;

  private String rowSuffix;

  private String rowSuffixCol;

  private byte[] incrementRow;

  private KeyType keyType;

  @Override

  public void initialize(byte[] table, byte[] cf) {

    this.table = table;

    this.cf = cf;

  }

  @Override

  public List<PutRequest> getActions() {

    List<PutRequest> actions = new ArrayList<PutRequest>();

    if(payloadColumn != null){

      byte[] rowKey;

      try {

        switch (keyType) {

          case TS:

            rowKey = SimpleRowKeyGenerator.getTimestampKey(rowSuffix);

            break;

          case TSNANO:

            rowKey = SimpleRowKeyGenerator.getNanoTimestampKey(rowSuffix);

            break;

          case RANDOM:

            rowKey = SimpleRowKeyGenerator.getRandomKey(rowSuffix);

            break;

          default:

            rowKey = SimpleRowKeyGenerator.getUUIDKey(rowSuffix);

            break;

        }

	// for 循环。提交全部列和对于数据的put请求。

	for (int i = 0; i < this.payload.length; i++)

	{

        	PutRequest putRequest =  new PutRequest(table, rowKey, cf,payloadColumn[i], payload[i]);

        	actions.add(putRequest);

	}

      } catch (Exception e){

        throw new FlumeException("Could not get row key!", e);

      }

    }

    return actions;

  }

  public List<AtomicIncrementRequest> getIncrements(){

    List<AtomicIncrementRequest> actions = new

        ArrayList<AtomicIncrementRequest>();

    if(incrementColumn != null) {

      AtomicIncrementRequest inc = new AtomicIncrementRequest(table,

          incrementRow, cf, incrementColumn);

      actions.add(inc);

    }

    return actions;

  }

  @Override

  public void cleanUp() {

    // TODO Auto-generated method stub

  }

  @Override

  public void configure(Context context) {

    String pCol = context.getString("payloadColumn", "pCol");

    String iCol = context.getString("incrementColumn", "iCol");

    rowSuffixCol = context.getString("rowPrefixCol", "mac");

    String suffix = context.getString("suffix", "uuid");

    if(pCol != null && !pCol.isEmpty()) {

      if(suffix.equals("timestamp")){

        keyType = KeyType.TS;

      } else if (suffix.equals("random")) {

        keyType = KeyType.RANDOM;

      } else if(suffix.equals("nano")){

        keyType = KeyType.TSNANO;

      } else {

        keyType = KeyType.UUID;

      }

     	// 从配置文件里读出column。

     	String[] pCols = pCol.replace(" ", "").split(",");

     	payloadColumn = new byte[pCols.length][];

     	for (int i = 0; i < pCols.length; i++)

	{

		// 列名转为小写

		payloadColumn[i] = pCols[i].toLowerCase().getBytes(Charsets.UTF_8);

	}

    }

    if(iCol != null && !iCol.isEmpty()) {

      incrementColumn = iCol.getBytes(Charsets.UTF_8);

    }

    incrementRow =

        context.getString("incrementRow", "incRow").getBytes(Charsets.UTF_8);

  }

  @Override

  public void setEvent(Event event) {

	String strBody = new String(event.getBody());

	String[] subBody = strBody.split(this.payloadColumnSplit);

	if (subBody.length == this.payloadColumn.length)

	{

		this.payload = new byte[subBody.length][];

		for (int i = 0; i < subBody.length; i++)

		{

			this.payload[i] = subBody[i].getBytes(Charsets.UTF_8);

			if ((new String(this.payloadColumn[i]).equals(this.rowSuffixCol)))

			{

				// rowkey 前缀是某一列的值, 默认情况是mac地址

				this.rowSuffix = subBody[i];

			}

		}

	}

  }

  @Override

  public void configure(ComponentConfiguration conf) {

    // TODO Auto-generated method stub

  }

}

重点能够查看setEent，configure，getActions函数。

configure函数：读取flume配置文件内容。包含列名。rowkey后缀等信息
setEvent函数：获取flume event 内容，将其保存到payload数组中。
getActions函数：创建PutRequest实例。将rowkey，columnfamily,column,value等信息写入putrequest实例中。

源代码编译和运行

编写好自己定义的BaimiAsyncHbaseEventSerializer函数后，接下来须要编译源代码，生成flume-ng-hbase-sink.*.jar包，替换flume中原来的flume-ng-hbase-sink.*.jar包。

下载flume 1.5 源代码，解压后进入文件夹flume-1.5.0-src/flume-ng-sinks/flume-ng-hbase-sinks/src/main/java/org/apache/flume/sink/hbase/
复制上面的BaimiAsyncHbaseEventSerializer类到上面的文件夹中。
进入flume-1.5.0-src/flume-ng-sinks/flume-ng-hbase-sinks/。执行mvn编译命令【mvn install -Dmaven.test.skip=true】
mvn编译后会在flume-1.5.0-src/flume-ng-sinks/flume-ng-hbase-sinks/target文件夹下生成flume-ng-hbase-sink-1.5.0.jar,将这个jar包替换$FLUME_HOME/lib下的jar包
执行flume执行命令【flume-ng agent -c . -f conf/spoolDir.conf -n a1 -Dflume.root.logger=INFO,console】

flume 自己定义 hbase sink 类的更多相关文章

使用flume将kafka数据sink到HBase【转】
1. hbase sink介绍 1.1 HbaseSink 1.2 AsyncHbaseSink 2. 配置flume 3. 运行测试flume 4. 使用RegexHbaseEventSeriali ...
Flume+Kafka+Storm+Hbase+HDSF+Poi整合
Flume+Kafka+Storm+Hbase+HDSF+Poi整合需求: 针对一个网站,我们需要根据用户的行为记录日志信息,分析对我们有用的数据. 举例:这个网站www.hongten.com(当 ...
Flume：source和sink
Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...
[置顶] NS2中对TCP数据包和ACK包的TCP Sink类的主要实现代码详尽剖析--吐血放送
NS2中对TCP数据包和ACK包的TCP Sink类的主要实现代码详尽剖析,限于个人水平,如有错误请留言指出! TcpSink类的recv()方法: void TcpSink::recv(Packet ...
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...
Flume实时监控目录sink到hdfs，再用sparkStreaming监控hdfs的这个目录，对数据进行计算
目标:Flume实时监控目录sink到hdfs,再用sparkStreaming监控hdfs的这个目录,对数据进行计算 1.flume的配置,配置spoolDirSource_hdfsSink.pro ...
Map中如何把没有定义操作符<的类作为key
Map中如何把没有定义操作符<的类作为key 其实,为了实现快速查找,map内部本身就是按序存储的(比如红黑树).在我们插入<key, value>键值对时,就会按照key的大小顺序 ...
hadoop编程小技巧（5）---自己定义输入文件格式类InputFormat
Hadoop代码測试环境:Hadoop2.4 应用:在对数据须要进行一定条件的过滤和简单处理的时候能够使用自己定义输入文件格式类. Hadoop内置的输入文件格式类有: 1)FileInputForm ...
python_如何定义装饰器类？
案例: 实现一个能将函数调用信息记录到日志的装饰器需求: 把每次函数的调用时间,执行时间,调用次数写入日志可以对被装饰函数分组,调用信息记录到不同日志动态修改参数,比如日志格式动态打开关闭日志 ...

随机推荐

HTML中的uniqueID
Web页面上元素的name属性本身是可以重复的,理论上讲id是不可以重复的,但是现在的浏览器对重复的id都是默许的,可能有时候页面是需要一个唯一编号的.IE浏览器为页面上的所有元素都是提供了一个唯一名 ...
sf
#include <stdio.h> #include <time.h> #include <stdlib.h> #define MAXN 150 //最大节点数 ...
fcntl记录锁
#include<fcntl.h> int fcntl(fd,F_GETLK/F_SETLK/F_SETLKW,struct flock *flockptr); F_GETLK:测试flo ...
从源码看Android中sqlite是怎么通过cursorwindow读DB的
更多内容在这里查看 https://ahangchen.gitbooks.io/windy-afternoon/content/ 执行query 执行SQLiteDatabase类中query系列函数 ...
android一分钟学会可视化操作数据库（无需ROOT）
我刚开始弄android数据库的时候,想查询一些数据,以验证程序逻辑,发现很多方案都需要ROOT. 即便有不需要ROOT的方案,命令行交互也比较麻烦. 今天跟大家分享一下这个点点鼠标就能实现的功能. ...
通过layer-list多图层叠加效果实现圆角功能
在android的开发过程中,我们可能会做圆角的效果出来,如下图所示: 四个角都是圆角的效果.如果让UI设计人员直接出图,可能会更简单一些.但是我们使用android中layer-list多图层叠加效 ...
Coursera机器学习课程（2016 ）错题集
Unit 4 Neural Networks (×) 分析:估计D项错误,因为神经网络在处理逻辑运算的时候是range(0,1),但是处理别的运算的时候就不是这个范围了 (√) (对) week 6 ...
SDOTOJ2088 refresh的停车场（栈和队列）
refresh的停车场 Time Limit:1000MS Memory Limit:65536KB 64bit IO Format:%lld & %llu Submit S ...
Android文件存储使用参考
可能遇到的问题 android系统自身自带有存储,另外也可以通过sd卡来扩充存储空间.前者好比pc中的硬盘,后者好移动硬盘. 前者空间较小,后者空间大,但后者不一定可用. 开发应用,处理本地数据存取时 ...
hdu-4468-Spy-KMP+贪心
题目链接: http://acm.hdu.edu.cn/showproblem.php?pid=4468 题目意思: 给你一个串r,求一个串s,使得s的前缀1+s的前缀2+s的前缀3+...+s的前缀 ...

flume 自己定义 hbase sink 类

flume 1.5 的配置文件演示样例

BaimiAsyncHbaseEventSerializer类

源代码编译和运行

flume 自己定义 hbase sink 类的更多相关文章

随机推荐

热门专题