Flume-ng高可用集群负载安装与配置
1. 写在前面
flume-ng高可用长在大数据处理环节第一个出现,对于处理日志文件有很好的作用,本篇博客将详细介绍flume-ng的高可用负载均衡搭建
2. flume-ng高可用负载均衡描述
在一般情况下,Flume-ng高可用采用server和client模式,client主要负责数据源source及数据流向端的sink指向配置,server主要负责数据流向sink详细配置,client需要将server的信息统一管理,server和sink之间数据连接通过channels
3. 配置server,这里配置三个server
flume-server1.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz003
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz003
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
flume-server2.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz004
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz004
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
flume-server3.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz005
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz005
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
可以看出多个server配置的规律
3. 配置client,这里也配置一个client示例
flume-client.properties
#agent1 name
agent.channels = c1
agent.sources = r1
agent.sinks = k1 k2 k3 k4 k5 k6 k7
#set gruop
agent.sinkgroups = g1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 102400
agent.channels.c1.transactionCapacity = 1000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
agent.sources.r1.type = com.cbo.flume.source.zip.SpoolDirectorySource
agent.sources.r1.channels = c1
agent.sources.r1.spoolDir = /data/ynjz/workspace/zip
agent.sources.r1.fileHeader = true
agent.sources.r1.channels = c1
agent.sources.r1.flumeBatchSize=1000
agent.sources.r1.useFlumeEventFormat=false
agent.sources.r1.restart=true
agent.sources.r1.batchSize=1000
agent.sources.r1.batchTimeout=3000
agent.sources.r1.channels=c1
# set sink1
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz003
agent.sinks.k1.port = 52020
# set sink2
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz004
agent.sinks.k2.port = 52020
# set sink3
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz005
agent.sinks.k3.port = 52020
# set sink4
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz006
agent.sinks.k1.port = 52020
# set sink5
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz007
agent.sinks.k2.port = 52020
# set sink6
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz008
agent.sinks.k3.port = 52020
# set sink7
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz009
agent.sinks.k3.port = 52020
#set sink group
agent.sinkgroups.g1.sinks = k1 k2 k3 k4 k5 k6 k7
#set failover
agent.sinkgroups.g1.processor.type = failover
agent.sinkgroups.g1.processor.priority.k1 = 10
agent.sinkgroups.g1.processor.priority.k2 = 10
agent.sinkgroups.g1.processor.priority.k3 = 10
agent.sinkgroups.g1.processor.priority.k4 = 10
agent.sinkgroups.g1.processor.priority.k5 = 10
agent.sinkgroups.g1.processor.priority.k6 = 10
agent.sinkgroups.g1.processor.priority.k7 = 10
agent.sinkgroups.g1.processor.maxpenalty = 10000
这里需要注意sinkgroups配置,flume sinkgroups在常用的应用中有两种方式failover和load_balance,failover可以理解为容错机制,在上面的配置中sink只会往一个kafka写入数据,但一个kafka挂了,failover机制会立马选举一个出来,所以这里的容错机制很完善,但是应对大数据量会影响数据写入的能力,所以建议在大数据量的时候采用load_balance配置,下面时配置示例
#agent1 name
agent.channels = c1
agent.sources = r1
agent.sinks = k1 k2 k3 k4 k5 k6 k7
#set gruop
agent.sinkgroups = g1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 102400
agent.channels.c1.transactionCapacity = 24000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
agent.sources.r1.type = com.cbo.flume.source.zip.SpoolDirectorySource
agent.sources.r1.channels = c1
agent.sources.r1.spoolDir = /data/4G
agent.sources.r1.includePattern = ([^ ]*\.zip$)
agent.sources.r1.fileHeader = true
agent.sources.r1.channels = c1
agent.sources.r1.flumeBatchSize=10000
agent.sources.r1.useFlumeEventFormat=false
agent.sources.r1.restart=true
agent.sources.r1.batchSize=10000
agent.sources.r1.batchTimeout=3000
agent.sources.r1.channels=c1
# set sink1
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz003
agent.sinks.k1.port = 52020
# set sink2
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz004
agent.sinks.k2.port = 52020
# set sink3
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz005
agent.sinks.k3.port = 52020
# set sink4
agent.sinks.k4.channel = c1
agent.sinks.k4.type = avro
agent.sinks.k4.hostname = ynjz006
agent.sinks.k4.port = 52020
# set sink5
agent.sinks.k5.channel = c1
agent.sinks.k5.type = avro
agent.sinks.k5.hostname = ynjz007
agent.sinks.k5.port = 52020
# set sink6
agent.sinks.k6.channel = c1
agent.sinks.k6.type = avro
agent.sinks.k6.hostname = ynjz008
agent.sinks.k6.port = 52020
# set sink7
agent.sinks.k7.channel = c1
agent.sinks.k7.type = avro
agent.sinks.k7.hostname = ynjz009
agent.sinks.k7.port = 52020
#set sink group
agent.sinkgroups.g1.sinks = k1 k2 k3 k4 k5 k6 k7
#set load_balance
agent.sinkgroups.g1.processor.type=load_balance
agent.sinkgroups.g1.processor.backoff=true
agent.sinkgroups.g1.processor.selector=random
在实际应用中多个client基本上一直,只有监控文件目录的配置不同即可agent.sources.r1.spoolDir = /data/4G
4. 启动flume-ng高可用集群
首先启动每个server,每个server只是配置文件flume-server-data.properties不同:
./bin/flume-ng agent --name agent --conf conf --conf-file conf/flume-server-data.properties -Dflume.root.logger=INFO,console > /data/ynjz/workspace/flume-server-data.log 2>&1 &
启动每个client,,每个server只是配置文件flume-client-data.properties不同:
./bin/flume-ng agent --name agent --conf conf --conf-file conf/flume-client-data.properties -Dflume.root.logger=INFO,console > /data/ynjz/workspace/flume-client-data.log 2>&1 &
在平时应用中,可以随时停止client,但停止了server没起而启动client会导致报错
Flume-ng高可用集群负载安装与配置的更多相关文章
- Flume 学习笔记之 Flume NG高可用集群搭建
Flume NG高可用集群搭建: 架构总图: 架构分配: 角色 Host 端口 agent1 hadoop3 52020 collector1 hadoop1 52020 collector2 had ...
- 大数据高可用集群环境安装与配置(09)——安装Spark高可用集群
1. 获取spark下载链接 登录官网:http://spark.apache.org/downloads.html 选择要下载的版本 2. 执行命令下载并安装 cd /usr/local/src/ ...
- Flume NG高可用集群搭建详解
.Flume NG简述 Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持 Failover和负载均 ...
- 大数据高可用集群环境安装与配置(06)——安装Hadoop高可用集群
下载Hadoop安装包 登录 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ 镜像站,找到我们要安装的版本,点击进去复制下载链接 ...
- 大数据高可用集群环境安装与配置(07)——安装HBase高可用集群
1. 下载安装包 登录官网获取HBase安装包下载地址 https://hbase.apache.org/downloads.html 2. 执行命令下载并安装 cd /usr/local/src/ ...
- 大数据高可用集群环境安装与配置(03)——设置SSH免密登录
Hadoop的NameNode需要启动集群中所有机器的Hadoop守护进程,这个过程需要通过SSH登录来实现 Hadoop并没有提供SSH输入密码登录的形式,因此,为了能够顺利登录每台机器,需要将所有 ...
- 大数据高可用集群环境安装与配置(08)——安装Ganglia监控集群
1. 安装依赖包和软件 在所有服务器上输入命令进行安装操作 yum install epel-release -y yum install ganglia-web ganglia-gmetad gan ...
- 大数据高可用集群环境安装与配置(02)——配置ntp服务
NTP服务概述 NTP服务器[Network Time Protocol(NTP)]是用来使计算机时间同步化的一种协议,它可以使计算机对其服务器或时钟源(如石英钟,GPS等等)做同步化,它可以提供高精 ...
- 大数据高可用集群环境安装与配置(10)——安装Kafka高可用集群
1. 获取安装包下载链接 访问https://kafka.apache.org/downloads 找到kafka对应版本 需要与服务器安装的scala版本一致(运行spark-shell可以看到当前 ...
随机推荐
- [hosts]在hosts中屏蔽一级域名和二级域名的写法
一级域名,如baidu: 0.0.0.0 baidu.com 二级域名 如有道公开课 0.0.0.0 ke.youdao.com 不带协议名,不带www. 用127.0.0.1也可以.
- 算法-动态规划 Dynamic Programming--从菜鸟到老鸟
算法-动态规划 Dynamic Programming--从菜鸟到老鸟 版权声明:本文为博主原创文章,转载请标明出处. https://blog.csdn.net/u013309870/ar ...
- js+jquery创建元素
例:创建如下标签: <a id="baidu" class="link" name="baidu">这是一个链接</a&g ...
- Apache 解析.htaccess
解决.htaccess不解析 输入a2enmod rewrite 修改/etc/apache.conf 此处改为ALL
- Neo4j安装
一.Windows版本 1)下载java8,并配置环境变量 java下载请点击,提取码:f6ci 2)Neo4j下载 选windows版本 新建系统环境变量: 并配置Path环境变量,添加bin所在目 ...
- Linux中errno的含义
/****************************获取错误代码描述**************/ #include <string.h>#include <errno.h&g ...
- 编译VisualVM源码解决乱码问题
编译VisualVM源码解决乱码问题 起因 今天在使用VisualVM对测试服务器进行JVM监控的时候,发现所有统计图的横纵坐标都是显示乱码(小方块),即使我的Ubuntu系统使用的是英文语言环境.奇 ...
- 图片下载、渲染操作 小例子 看多FutureTask
并发执行下载图片操作 import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent. ...
- 递归处理vue菜单数据
结构不多说,bean的封装很简单,直接上核心代码吧,自己根据需要把不要的属性自己过滤掉: public List<MenuBo> getMenuByUserId(Long user_id, ...
- Java中ArrayList类的用法
1.什么是ArrayList ArrayList就是传说中的动态数组,用MSDN中的说法,就是Array的复杂版本,它提供了如下一些好处: 动态的增加和减少元素 实现了ICollection和ILis ...