http://flume.apache.org/FlumeUserGuide.html#custom-channel-selector
官方文档上channel selectors 有两种类型:
Replicating Channel Selector (default)
Multiplexing Channel Selector
这两种selector的区别是:Replicating 会将source过来的events发往所有channel,而Multiplexing 可以选择该发往哪些channel。对于上面的例子来说,如果采用Replicating ,那么demo和demo2的日志会同时发往channel1和channel2,这显然是和需求不符的,需求只是让demo的日志发往channel1,而demo2的日志发往channel2。
验证replicating ,验证思路是建立两个两个kafka channel 然后当flume采集数据数据会经过kafka ,通过kakfa的消费程序看是否发送给了两个kafka channel 
#测试 channel selector
#测试方法,chanel改为kafka 通过两个消费者验证消息 的发送策略
#
a1.sources = r1
a1.sinks = k1 a1.channels = c1 c2 c3
a1.sources.r1.selector.type = replicating
a1.sources.r1.channels = c1 c2
#a1.sources.r1.selector.optional = c3 # For each one of the sources, the type is defined
#agent.sources.seqGenSrc.type = seq
#a1.sources.r1.type = netcat
#a1.sources.r1.bind=mini1
#a1.sources.r1.port= a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/flume/test/logs/flume2.dat # The channel can be defined as follows.
#agent.sources.seqGenSrc.channels = memoryChannel
#a1.channels.c1.type=memory
#a1.channels.c1.capacity=
#a1.channels.c1.transactionCapacity =
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = mini1:,mini2:,mini3:
#channel selector replicating
a1.channels.c1.kafka.topic = csr1
a1.channels.c1.kafka.consumer.group.id = csr01 a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.kafka.bootstrap.servers = mini1:,mini2:,mini3: #channel selector replicating
a1.channels.c2.kafka.topic = csr2
a1.channels.c2.kafka.consumer.group.id = csr02 # Each sink's type must be defined
#agent.sinks.loggerSink.type = logger
a1.sinks.k1.type = logger
#Specify the channel the sink should use
#agent.sinks.loggerSink.channel = memoryChannel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
# Each channel's type is defined.
#agent.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
#agent.channels.memoryChannel.capacity =

kafka 消费程序

 public static void main(String[] args) throws IOException {
Properties props = new Properties();
props.load(TestConsumer.class.getClass().getResourceAsStream("/kfkConsumer.properties"));
KafkaConsumer<Integer, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("csr2","csr1"));
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll();
for (ConsumerRecord<Integer, String> record : records) {
System.out.print("Thread : " + Thread.currentThread().getName());
System.out.printf("topic = %s, offset = %d, key = %s, value = %s, partition = %d %n",record.topic(), record.offset(), record.key(), record.value(), record.partition());
}
consumer.commitSync();
} }

消费结果

Thread : maintopic = csr1,  offset = , key = null, value =  from haishang, partition =
Thread : maintopic = csr2, offset = , key = null, value = from haishang, partition =

结论,flume channel selector 使用 replicating 策略时 会把消息发送给所有的配置的可以用的channel

第二种验证方法,此时要启动三个节点,注意其中sources.sinks,的名字

第一个flume中

#channelSelector_replicationg_avro.conf 
# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2 # Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port =
#a1.sources.r1.host = 192.168.233.128
a1.sources.r1.host = 192.168.10.201
a1.sources.r1.selector.type = replicating
a1.sources.r1.channels = c1 c2 # Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
#a1.sinks.k1.hostname = 192.168.233.129
a1.sinks.k1.hostname = 192.168.10.202
a1.sinks.k1.port = a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
#a1.sinks.k2.hostname = 192.168.233.130
a1.sinks.k2.hostname = 192.168.10.203
a1.sinks.k2.port =
# Use a channel which buffers events inmemory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = a1.channels.c2.type = memory
a1.channels.c2.capacity =
a1.channels.c2.transactionCapacity =

sink


#channelSelector_replicating_sink.conf 
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.channels = c1
#a2.sources.r1.bind = 192.168.233.129
a2.sources.r1.bind = 192.168.10.202
a2.sources.r1.port = 50000

# Describe the sink
a2.sinks.k1.type = logger
a2.sinks.k1.channel = c1

# Use a channel which buffers events inmemory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

 

sink


#channelSelector_replicating_sink.conf 
# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1


# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.channels = c1
#a3.sources.r1.bind = 192.168.233.130
a3.sources.r1.bind = 192.168.10.203
a3.sources.r1.port = 50000


# Describe the sink
a3.sinks.k1.type = logger
a3.sinks.k1.channel = c1


# Use a channel which buffers events inmemory
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100
~

 

启动命令

启动sink

bin/flume-ng agent -c conf -f conf/channelSelector_replicating_sink.conf -n a3 -Dflume.root.logger=INFO,console

flume-ng agent -c conf -f conf/channelSelector_replicating_sink.conf -n a2 -Dflume.root.logger=INFO,console

启动source

flume-ng agent -c conf -f conf/channelSelector_replicationg_avro.conf -n a1 -Dflume.root.logger=INFO,console

发送消息 :echo "you are the best "| nc 192.168.10.201 50000

验证multiplexing

source

#配置文
a1.sources= r1
a1.sinks= k1 k2
a1.channels= c1 c2 #Describe/configure the source
a1.sources.r1.type=http
a1.sources.r1.port= #a1.sources.r1.host= 192.168.233.128
a1.sources.r1.host=mini1
a1.sources.r1.selector.type= multiplexing
a1.sources.r1.channels= c1 c2 a1.sources.r1.selector.header= state
a1.sources.r1.selector.mapping.CZ= c1
a1.sources.r1.selector.mapping.US= c2
a1.sources.r1.selector.default= c1 #Describe the sink
a1.sinks.k1.type= avro
a1.sinks.k1.channel= c1
#a1.sinks.k1.hostname= 192.168.233.129
a1.sinks.k1.hostname=mini2
a1.sinks.k1.port= a1.sinks.k2.type= avro
a1.sinks.k2.channel= c2
#a1.sinks.k2.hostname= 192.168.233.130
a1.sinks.k2.hostname=mini3
a1.sinks.k2.port=
# Usea channel which buffers events in memory
a1.channels.c1.type= memory
a1.channels.c1.capacity=
a1.channels.c1.transactionCapacity= a1.channels.c2.type= memory
a1.channels.c2.capacity=
a1.channels.c2.transactionCapacity=

sink1

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1 # Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.channels = c1
#a2.sources.r1.bind = 192.168.233.129
a2.sources.r1.bind = mini2
a2.sources.r1.port = # Describe the sink
a2.sinks.k1.type = logger
a2.sinks.k1.channel = c1 # Use a channel which buffers events inmemory
a2.channels.c1.type = memory
a2.channels.c1.capacity =
a2.channels.c1.transactionCapacity =

sink2

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1 # Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.channels = c1
#.sources.r1.bind = 192.168.233.129
a3.sources.r1.bind = mini3
a3.sources.r1.port = # Describe the sink
a3.sinks.k1.type = logger
a3.sinks.k1.channel = c1 # Use a channel which buffers events inmemory
a3.channels.c1.type = memory
a3.channels.c1.capacity =
a3.channels.c1.transactionCapacity =

启动sink

bin/flume-ng agent -c conf -f conf/channelSelector_mul_sink.conf -n a3 -Dflume.root.logger=INFO,console

bin/flume-ng agent -c conf -f conf/channelSelector_mul_sink.conf -n a2 -Dflume.root.logger=INFO,console

bin/flume-ng agent -c conf -f conf/channelSelector_multi.conf -n a1 -Dflume.root.logger=INFO,console

有以上命令推断出配置文件名字

执行命令

curl -X POST -d '[{"headers" :{"state" : "CZ"},"body" :"CZ"}]' http://mini1:50000

curl -X POST -d '[{"headers" :{"state" : "US"},"body" :"US"}]' http://mini1:50000

curl -X POST -d '[{"headers" :{"state" : "NO"},"body" :"no"}]' http://mini1:50000

结果

CZ的消息会发送到sink1节点上

US会发送大sink2基点,

//,NO 的消息会发送到sink1节点上

//其中CZ和US是在上面source节点配置的,NO没有配置

//但是为什么NO的消息会一直发送到sink1

上面的source 中有连个新的类型 syslogtcp(Syslogtcp监听TCP的端口做为数据源) http()

Flume Channel Selectors + kafka的更多相关文章

  1. Flume Channel Selectors官网剖析(博主推荐)

    不多说,直接上干货! Flume Sources官网剖析(博主推荐) Flume Channels官网剖析(博主推荐) 一切来源于flume官网 http://flume.apache.org/Flu ...

  2. Flafka: Apache Flume Meets Apache Kafka for Event Processing

    The new integration between Flume and Kafka offers sub-second-latency event processing without the n ...

  3. 消费滚动滴log日志文件(flume监听,kafka消费,zookeeper协同)

    第一步:数据源 手写程序实现自动生成如下格式的日志文件: 15837312345,13737312345,2017-01-09 08:09:10,0360 打包放到服务器,使用如下命令执行,模拟持续不 ...

  4. Flume下读取kafka数据后再打把数据输出到kafka,利用拦截器解决topic覆盖问题

    1:如果在一个Flume Agent中同时使用Kafka Source和Kafka Sink来处理events,便会遇到Kafka Topic覆盖问题,具体表现为,Kafka Source可以正常从指 ...

  5. Flume Channel Selector

    Flume 基于Channel Selector可以实现扇入.扇出. 同一个数据源分发到不同的目的,如下图. 在source上可以定义channel selector: 1 2 3 4 5 6 7 8 ...

  6. Flume Channel

    http://blog.csdn.net/john_f_lau/article/details/20913831 http://dev.cmcm.com/archives/194

  7. 【翻译】Flume 1.8.0 User Guide(用户指南) Channel

    翻译自官网flume1.8用户指南,原文地址:Flume 1.8.0 User Guide 篇幅限制,分为以下5篇: [翻译]Flume 1.8.0 User Guide(用户指南) [翻译]Flum ...

  8. 一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考

    本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...

  9. CentOS7搭建Flume与Kafka整合及基础操作与测试

    前提 已完成Kafka的搭建,具体步骤参照CentOS7搭建Kafka单机环境及基础操作 Flume安装 下载 wget http://mirrors.tuna.tsinghua.edu.cn/apa ...

随机推荐

  1. Thinkphp学习笔记-模板赋值

    如果要在模板中输出变量,必须在在控制器中把变量传递给模板,系统提供了assign方法对模板变量赋值,无论何种变量类型都统一使用assign赋值. $this->assign('name',$va ...

  2. 清除linux系统的多余引导

    由于我把系统给升级(update)了,在grub引导模式出现新旧版本(Grub与Grub2)的引导系统分别为正常启动和进入恢复模式各2个引导项,如下图显示:百度找不到相关或类似的教程,只好半夜起来研究 ...

  3. 转:修改Android签名证书keystore的密码、别名alias以及别名密码

    转自:http://blog.k-res.net/archives/1671.html 二月 5, 2014  |  Posted by K-Res   之前在测试Eclipse ADT的Custom ...

  4. SQLServer2008备份时发生无法打开备份设备

    如下图所示,在执行SQL一个简单的备份命令时发生下面的情况 问题分析: 1:可能是文件夹目录权限问题 2:可能是登录SQLServer服务器用户策略问题 于是就查看了E:\dw_backup的文件夹权 ...

  5. Oracle数据库导入dmp文件报错处理方法

    在向oracle数据库执行导入命令的时候报错,错误如下,大概意思是TNS中找不到服务名 下面说一下解决步骤 1:进入oracle用户,使用cat查看.bash_profile文件,找到ORACLE_H ...

  6. (转)[ActionScript 3] Google-ProtoBuf for AS

    最近由于项目的需要,研究了一下protobuf.在这里分享一下经验,具体介绍网上也有不少,可以百度先了解一下. ProtoBuf在as中主要就是序列反序列化的作用,我们主要用它来代替amf,因为像c+ ...

  7. 在 WF 4 中编写自定义控制流活动

    在 WF 4 中编写自定义控制流活动 Leon Welicki 控制流是指组织和执行程序中各个指令的方法. 在 Windows Workflow Foundation 4 (WF 4) 中,控制流活动 ...

  8. Windows Server 2012怎样部署Domain Controller

    用过Windows Server2008 系统的运维师们,可能习惯于用dcpromo的方式部署Domain Controller,可是在WindowsServer2012操作系统已经把这样的部署方式取 ...

  9. (回溯法)数组中和为S的N个数

    Given a list of numbers, find the number of tuples of size N that add to S. for example in the list ...

  10. Visual Studio 2012安装VASSISTX插件后导致CPU高的解决的方法

    笔者一直都喜欢用VAX插件来做C++的开发,但发现VS2012安装了VAX后,CPU占用超级高,有时界面卡死得很厉害.我卸了又装,升级最新版,都无论用. 直到有天.看到网友说:VS2012的sdf文件 ...