flume 读取kafka 数据

本文介绍flume读取kafka数据的方法

代码：

/*******************************************************************************
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*******************************************************************************/
package org.apache.flume.source.kafka;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.message.Message;

import kafka.message.MessageAndMetadata;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.conf.ConfigurationException;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import org.apache.flume.source.SyslogParser;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* A Source for Kafka which reads messages from kafka. I use this in company production environment

* and its performance is good. Over 100k messages per second can be read from kafka in one source.<p>
* <tt>zookeeper.connect: </tt> the zookeeper ip kafka use.<p>
* <tt>topic: </tt> the topic to read from kafka.<p>
* <tt>group.id: </tt> the groupid of consumer group.<p>
*/
public class KafkaSource extends AbstractSource implements Configurable, PollableSource {
   private static final Logger log = LoggerFactory.getLogger(KafkaSource.class);
   private ConsumerConnector consumer;
   private ConsumerIterator<byte[], byte[]> it;
   private String topic;

   public Status process() throws EventDeliveryException {
       List<Event> eventList = new ArrayList<Event>();
        MessageAndMetadata<byte[],byte[]> message;
       Event event;
       Map<String, String> headers;
        String strMessage;
        try {
           if(it.hasNext()) {
               message = it.next();
               event = new SimpleEvent();
               headers = new HashMap<String, String>();
               headers.put("timestamp", String.valueOf(System.currentTimeMillis()));

strMessage = String.valueOf(System.currentTimeMillis()) + "|" + new String(message.message());
log.debug("Message: {}", strMessage);

                event.setBody(strMessage.getBytes());
                //event.setBody(message.message());
               event.setHeaders(headers);
               eventList.add(event);
           }
           getChannelProcessor().processEventBatch(eventList);
           return Status.READY;
       } catch (Exception e) {
           log.error("KafkaSource EXCEPTION, {}", e.getMessage());
           return Status.BACKOFF;
       }
   }

   public void configure(Context context) {
       topic = context.getString("topic");
       if(topic == null) {
           throw new ConfigurationException("Kafka topic must be specified.");
       }
       try {
           this.consumer = KafkaSourceUtil.getConsumer(context);
       } catch (IOException e) {
           log.error("IOException occur, {}", e.getMessage());
       } catch (InterruptedException e) {
           log.error("InterruptedException occur, {}", e.getMessage());
       }
       Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
       topicCountMap.put(topic, new Integer(1));
       Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
       if(consumerMap == null) {
           throw new ConfigurationException("topicCountMap is null");
       }
       List<KafkaStream<byte[], byte[]>> topicList = consumerMap.get(topic);
       if(topicList == null || topicList.isEmpty()) {
           throw new ConfigurationException("topicList is null or empty");
       }
        KafkaStream<byte[], byte[]> stream = topicList.get(0);
        it = stream.iterator();
   }

   @Override
   public synchronized void stop() {
       consumer.shutdown();
       super.stop();
   }

}

import java.io.IOException;
import java.util.Map;
import java.util.Properties;

import com.google.common.collect.ImmutableMap;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.javaapi.consumer.ConsumerConnector;

import org.apache.flume.Context;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class KafkaSourceUtil {
private static final Logger log = LoggerFactory.getLogger(KafkaSourceUtil.class);

   public static Properties getKafkaConfigProperties(Context context) {
       log.info("context={}",context.toString());
       Properties props = new Properties();
        ImmutableMap<String, String> contextMap = context.getParameters();
        for (Map.Entry<String,String> entry : contextMap.entrySet()) {
            String key = entry.getKey();
            if (!key.equals("type") && !key.equals("channel")) {
                props.setProperty(entry.getKey(), entry.getValue());
                log.info("key={},value={}", entry.getKey(), entry.getValue());
            }
        }
       return props;
   }
   public static ConsumerConnector getConsumer(Context context) throws IOException, InterruptedException {
       ConsumerConfig consumerConfig = new ConsumerConfig(getKafkaConfigProperties(context));
       ConsumerConnector consumer = Consumer.createJavaConsumerConnector(consumerConfig);
       return consumer;
   }
}

配置文件：（ /etc/flume/conf/flume-kafka-file.properties）

agent_log.sources = kafka0
agent_log.channels = ch0
agent_log.sinks = sink0

agent_log.sources.kafka0.channels = ch0
agent_log.sinks.sink0.channel = ch0

agent_log.sources.kafka0.type = org.apache.flume.source.kafka.KafkaSource
agent_log.sources.kafka0.zookeeper.connect = node3:2181,node4:2181,node5:2181
agent_log.sources.kafka0.topic = kkt-test-topic
agent_log.sources.kafka0.group.id= test

agent_log.channels.ch0.type = memory
agent_log.channels.ch0.capacity = 2048
agent_log.channels.ch0.transactionCapacity = 1000

agent_log.sinks.sink0.type=file_roll
agent_log.sinks.sink0.sink.directory=/data/flumeng/data/test
agent_log.sinks.sink0.sink.rollInterval=300

启动脚本：

sudo su -l -s /bin/bash flume -c '/usr/lib/flume/bin/flume-ng agent --conf /etc/flume/conf --conf-file /etc/flume/conf/flume-kafka-file.properties -name agent_log -Dflume.root.logger=INFO,console '

注意：红色字体的功能是对原来数据增加时间戳

版本号 flume-1.4.0.2.1.1.0 + kafka2.8.0-0.8.0

參考资料：https://github.com/baniuyao/flume-kafka

编译用到的库：

flume-ng-configuration-1.4.0.2.1.1.0-385

flume-ng-core-1.4.0.2.1.1.0-385

flume-ng-sdk-1.4.0.2.1.1.0-385

flume-tools-1.4.0.2.1.1.0-385

guava-11.0.2

kafka_2.8.0-0.8.0

log4j-1.2.15

scala-compiler

scala-library

slf4j-api-1.6.1

slf4j-log4j12-1.6.1

zkclient-0.3

zookeeper-3.3.4

flume 读取kafka 数据的更多相关文章

spark读取kafka数据 createStream和createDirectStream的区别
1.KafkaUtils.createDstream 构造函数为KafkaUtils.createDstream(ssc, [zk], [consumer group id], [per-topic, ...
SparkStreaming直连方式读取kafka数据，使用MySQL保存偏移量
SparkStreaming直连方式读取kafka数据,使用MySQL保存偏移量 1. ScalikeJDBC 2.配置文件 3.导入依赖的jar包 4.源码测试通过MySQL保存kafka的偏移量 ...
Flume下读取kafka数据后再打把数据输出到kafka,利用拦截器解决topic覆盖问题
1:如果在一个Flume Agent中同时使用Kafka Source和Kafka Sink来处理events,便会遇到Kafka Topic覆盖问题,具体表现为,Kafka Source可以正常从指 ...
使用Flume消费Kafka数据到HDFS
1.概述对于数据的转发,Kafka是一个不错的选择.Kafka能够装载数据到消息队列,然后等待其他业务场景去消费这些数据,Kafka的应用接口API非常的丰富,支持各种存储介质,例如HDFS.HBa ...
使用flume将kafka数据sink到HBase【转】
1. hbase sink介绍 1.1 HbaseSink 1.2 AsyncHbaseSink 2. 配置flume 3. 运行测试flume 4. 使用RegexHbaseEventSeriali ...
flink 读取kafka 数据，partition分配
每个并发有个编号,只会读取kafka partition % 总并发数 == 编号的分区如: 6 分区, 4个并发分区: p0 p1 p2 p3 p4 p5 并发: 0 1 2 3 ...
Logstash读取Kafka数据写入HDFS详解
强大的功能,丰富的插件,让logstash在数据处理的行列中出类拔萃通常日志数据除了要入ES提供实时展示和简单统计外,还需要写入大数据集群来提供更为深入的逻辑处理,前边几篇ELK的文章介绍过利用lo ...
使用spark-streaming实时读取Kafka数据统计结果存入MySQL
在这篇文章里,我们模拟了一个场景,实时分析订单数据,统计实时收益. 场景模拟我试图覆盖工程上最为常用的一个场景: 1)首先,向Kafka里实时的写入订单数据,JSON格式,包含订单ID-订单类型-订 ...
SparkStreaming python 读取kafka数据将结果输出到单个指定本地文件
# -*- coding: UTF-8 -*- #!/bin/env python3 # filename readFromKafkaStreamingGetLocation.py import IP ...

随机推荐

ios 人魔七七
http://www.cnblogs.com/qiqibo/category/533488.html
mysql创建数据库并指定uft8编码
CREATE DATABASE IF NOT EXISTS ymk default character set utf8 COLLATE utf8_general_ci;
java源码之HashSet
1,HashSet介绍 1)HashSet 是一个没有重复元素的集合.2)它是由HashMap实现的,不保证元素的顺序,而且HashSet允许使用 null 元素.3)HashSet是非同步的.如果多 ...
Java实现把两个数组合并为一个的方法总结
本文实例讲述了Java实现把两个数组合并为一个的方法.分享给大家供大家参考,具体如下: 在Java中,如何把两个String[]合并为一个? 看起来是一个很简单的问题.但是如何才能把代码写得高效简洁, ...
洛谷 P2111 考场奇遇
P2111 考场奇遇题目背景本市的某神校里有一个学霸,他的名字叫小明(为了保护主人公的隐私,他的名字都用“小明”代替).在这次的期中考试中,小明同学走桃花运,在考场上认识了一位女生,她的名字叫小红 ...
HDU 4333 Contest 4
一开始就想到了扩展KMP,因为只有扩展KMP才是处理后缀的.但忽然短路以为扩展KMP求的是最长公共后缀,囧....又浪费了很多时间,都是对这个算法练得不多再看那个扩展KMP算法之后,就很确定要的就是 ...
Android-Universal-Image-Loader学习笔记（3）--内存缓存
前面的两篇博客写了文件缓存.如今说说Android-Universal-Image-Loader的内存缓存.该内存缓存涉及到的类如图所看到的这些类的继承关系例如以下图所看到的: 如同文件缓存一样,内 ...
"浪潮杯"第六届ACM山东省省赛山科场总结
从空间拷过来的.尽管已经过去一个月了.记忆犹新也算是又一次拾起这个blog Just begin 看着一群群大牛还有队友男神的省赛总结都出了我最终也耐不住寂寞来做个流水账抒抒情好了第一次省赛 ...
JavaScript 没有函数重载&Arguments对象
对于学过Java的人来说.函数重载并非一个陌生的概念,可是javaScript中有函数重载么...接下来我们就进行測试 <script type="text/javascript&qu ...
react-route4 按需加载配置心得
本篇文章主要记录笔者项目中使用 react-route + webpack 做路由按需加载的心得,可能只有笔者一个人看,权当日记了. 很久很久以前,react-route还是2.X和3.X版本的时 ...

flume 读取kafka 数据

flume 读取kafka 数据的更多相关文章

随机推荐

热门专题