Flume-ng高可用集群负载安装与配置
1. 写在前面
flume-ng高可用长在大数据处理环节第一个出现,对于处理日志文件有很好的作用,本篇博客将详细介绍flume-ng的高可用负载均衡搭建
2. flume-ng高可用负载均衡描述
在一般情况下,Flume-ng高可用采用server和client模式,client主要负责数据源source及数据流向端的sink指向配置,server主要负责数据流向sink详细配置,client需要将server的信息统一管理,server和sink之间数据连接通过channels
3. 配置server,这里配置三个server
flume-server1.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz003
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz003
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
flume-server2.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz004
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz004
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
flume-server3.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz005
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz005
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
可以看出多个server配置的规律
3. 配置client,这里也配置一个client示例
flume-client.properties
#agent1 name
agent.channels = c1
agent.sources = r1
agent.sinks = k1 k2 k3 k4 k5 k6 k7
#set gruop
agent.sinkgroups = g1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 102400
agent.channels.c1.transactionCapacity = 1000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
agent.sources.r1.type = com.cbo.flume.source.zip.SpoolDirectorySource
agent.sources.r1.channels = c1
agent.sources.r1.spoolDir = /data/ynjz/workspace/zip
agent.sources.r1.fileHeader = true
agent.sources.r1.channels = c1
agent.sources.r1.flumeBatchSize=1000
agent.sources.r1.useFlumeEventFormat=false
agent.sources.r1.restart=true
agent.sources.r1.batchSize=1000
agent.sources.r1.batchTimeout=3000
agent.sources.r1.channels=c1
# set sink1
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz003
agent.sinks.k1.port = 52020
# set sink2
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz004
agent.sinks.k2.port = 52020
# set sink3
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz005
agent.sinks.k3.port = 52020
# set sink4
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz006
agent.sinks.k1.port = 52020
# set sink5
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz007
agent.sinks.k2.port = 52020
# set sink6
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz008
agent.sinks.k3.port = 52020
# set sink7
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz009
agent.sinks.k3.port = 52020
#set sink group
agent.sinkgroups.g1.sinks = k1 k2 k3 k4 k5 k6 k7
#set failover
agent.sinkgroups.g1.processor.type = failover
agent.sinkgroups.g1.processor.priority.k1 = 10
agent.sinkgroups.g1.processor.priority.k2 = 10
agent.sinkgroups.g1.processor.priority.k3 = 10
agent.sinkgroups.g1.processor.priority.k4 = 10
agent.sinkgroups.g1.processor.priority.k5 = 10
agent.sinkgroups.g1.processor.priority.k6 = 10
agent.sinkgroups.g1.processor.priority.k7 = 10
agent.sinkgroups.g1.processor.maxpenalty = 10000
这里需要注意sinkgroups配置,flume sinkgroups在常用的应用中有两种方式failover
和load_balance
,failover
可以理解为容错机制,在上面的配置中sink只会往一个kafka写入数据,但一个kafka挂了,failover
机制会立马选举一个出来,所以这里的容错机制很完善,但是应对大数据量会影响数据写入的能力,所以建议在大数据量的时候采用load_balance
配置,下面时配置示例
#agent1 name
agent.channels = c1
agent.sources = r1
agent.sinks = k1 k2 k3 k4 k5 k6 k7
#set gruop
agent.sinkgroups = g1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 102400
agent.channels.c1.transactionCapacity = 24000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
agent.sources.r1.type = com.cbo.flume.source.zip.SpoolDirectorySource
agent.sources.r1.channels = c1
agent.sources.r1.spoolDir = /data/4G
agent.sources.r1.includePattern = ([^ ]*\.zip$)
agent.sources.r1.fileHeader = true
agent.sources.r1.channels = c1
agent.sources.r1.flumeBatchSize=10000
agent.sources.r1.useFlumeEventFormat=false
agent.sources.r1.restart=true
agent.sources.r1.batchSize=10000
agent.sources.r1.batchTimeout=3000
agent.sources.r1.channels=c1
# set sink1
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz003
agent.sinks.k1.port = 52020
# set sink2
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz004
agent.sinks.k2.port = 52020
# set sink3
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz005
agent.sinks.k3.port = 52020
# set sink4
agent.sinks.k4.channel = c1
agent.sinks.k4.type = avro
agent.sinks.k4.hostname = ynjz006
agent.sinks.k4.port = 52020
# set sink5
agent.sinks.k5.channel = c1
agent.sinks.k5.type = avro
agent.sinks.k5.hostname = ynjz007
agent.sinks.k5.port = 52020
# set sink6
agent.sinks.k6.channel = c1
agent.sinks.k6.type = avro
agent.sinks.k6.hostname = ynjz008
agent.sinks.k6.port = 52020
# set sink7
agent.sinks.k7.channel = c1
agent.sinks.k7.type = avro
agent.sinks.k7.hostname = ynjz009
agent.sinks.k7.port = 52020
#set sink group
agent.sinkgroups.g1.sinks = k1 k2 k3 k4 k5 k6 k7
#set load_balance
agent.sinkgroups.g1.processor.type=load_balance
agent.sinkgroups.g1.processor.backoff=true
agent.sinkgroups.g1.processor.selector=random
在实际应用中多个client基本上一直,只有监控文件目录的配置不同即可agent.sources.r1.spoolDir = /data/4G
4. 启动flume-ng高可用集群
首先启动每个server,每个server只是配置文件flume-server-data.properties
不同:
./bin/flume-ng agent --name agent --conf conf --conf-file conf/flume-server-data.properties -Dflume.root.logger=INFO,console > /data/ynjz/workspace/flume-server-data.log 2>&1 &
启动每个client,,每个server只是配置文件flume-client-data.properties
不同:
./bin/flume-ng agent --name agent --conf conf --conf-file conf/flume-client-data.properties -Dflume.root.logger=INFO,console > /data/ynjz/workspace/flume-client-data.log 2>&1 &
在平时应用中,可以随时停止client,但停止了server没起而启动client会导致报错
Flume-ng高可用集群负载安装与配置的更多相关文章
- Flume 学习笔记之 Flume NG高可用集群搭建
Flume NG高可用集群搭建: 架构总图: 架构分配: 角色 Host 端口 agent1 hadoop3 52020 collector1 hadoop1 52020 collector2 had ...
- 大数据高可用集群环境安装与配置(09)——安装Spark高可用集群
1. 获取spark下载链接 登录官网:http://spark.apache.org/downloads.html 选择要下载的版本 2. 执行命令下载并安装 cd /usr/local/src/ ...
- Flume NG高可用集群搭建详解
.Flume NG简述 Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持 Failover和负载均 ...
- 大数据高可用集群环境安装与配置(06)——安装Hadoop高可用集群
下载Hadoop安装包 登录 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ 镜像站,找到我们要安装的版本,点击进去复制下载链接 ...
- 大数据高可用集群环境安装与配置(07)——安装HBase高可用集群
1. 下载安装包 登录官网获取HBase安装包下载地址 https://hbase.apache.org/downloads.html 2. 执行命令下载并安装 cd /usr/local/src/ ...
- 大数据高可用集群环境安装与配置(03)——设置SSH免密登录
Hadoop的NameNode需要启动集群中所有机器的Hadoop守护进程,这个过程需要通过SSH登录来实现 Hadoop并没有提供SSH输入密码登录的形式,因此,为了能够顺利登录每台机器,需要将所有 ...
- 大数据高可用集群环境安装与配置(08)——安装Ganglia监控集群
1. 安装依赖包和软件 在所有服务器上输入命令进行安装操作 yum install epel-release -y yum install ganglia-web ganglia-gmetad gan ...
- 大数据高可用集群环境安装与配置(02)——配置ntp服务
NTP服务概述 NTP服务器[Network Time Protocol(NTP)]是用来使计算机时间同步化的一种协议,它可以使计算机对其服务器或时钟源(如石英钟,GPS等等)做同步化,它可以提供高精 ...
- 大数据高可用集群环境安装与配置(10)——安装Kafka高可用集群
1. 获取安装包下载链接 访问https://kafka.apache.org/downloads 找到kafka对应版本 需要与服务器安装的scala版本一致(运行spark-shell可以看到当前 ...
随机推荐
- Http input plugin
logstash作为一个数据输入端.提供http服务,接收客户端的http请求,获取发送的内容. 在config目录下新建文件: vim config/http-input.yml input { h ...
- mac与windows共享键盘鼠标(synergy)
桌面上有两台电脑, 一台mac一台windows, 由于桌面空间紧张, 放两套键盘鼠标有点浪费空间, 如果能让mac和windows共享键盘鼠标就好了, 经过一番搜寻, 找到了一款名为synergy的 ...
- C++:普通变量C++命名规则
C++提倡使用拥有一定意义的变量名,使程序代码更有阅读性,命名是必须使用的几种简单的C++命名规则: 命名时只能使用:字母字符.数字和下划线(_); 第一个字符不能是数字: 区分大小写(C++对大小写 ...
- Unity 子弹移动以及碰撞条件
一.子弹移动 游戏物体移动最主要的是获取一个刚体组件,再对这个刚体组件添加一个向前的力: 具体代码: public class BulletCtrl : MonoBehaviour { ; publi ...
- Nacos集群环境的搭建与配置
Nacos集群环境的搭建与配置 集群搭建 一.环境: 服务器环境:CENTOS-7.4-64位 三台服务器IP:192.168.102.57:8848,192.168.102.59:8848,192. ...
- 关于word-break和word-wrap的使用和区别
当一段文字有一个长长长的英文单词的情况下使用这两个属性的区别: word-wrap: 哈哈哈, aaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb word-break: ...
- Neo4j安装
一.Windows版本 1)下载java8,并配置环境变量 java下载请点击,提取码:f6ci 2)Neo4j下载 选windows版本 新建系统环境变量: 并配置Path环境变量,添加bin所在目 ...
- cannot update the cursor rep,since it is read-only
操作DBF文件,开发机器读写都OK,但部署到服务器上后报:cannot update the cursor rep,since it is read-only 网上寻找解决方案英文答案比较多,也没有给 ...
- C语言 变量的作用域和生命周期(转)
转自 https://blog.csdn.net/u011616739/article/details/62052179 a.普通局部变量 属于某个{},在{}外部不能使用此变量,在{}内部是可以使用 ...
- IDEA+Maven配置MyBatis的初体验(坑点总结)
起因 在老师的推荐下,我跟着这篇文章https://www.jb51.net/article/70923.htm尝试MyBatis的初次调试运行.途中困难重重,由于教程中的开发环境是eclipse,项 ...