Flume-ng高可用集群负载安装与配置
1. 写在前面
flume-ng高可用长在大数据处理环节第一个出现,对于处理日志文件有很好的作用,本篇博客将详细介绍flume-ng的高可用负载均衡搭建
2. flume-ng高可用负载均衡描述
在一般情况下,Flume-ng高可用采用server和client模式,client主要负责数据源source及数据流向端的sink指向配置,server主要负责数据流向sink详细配置,client需要将server的信息统一管理,server和sink之间数据连接通过channels
3. 配置server,这里配置三个server
flume-server1.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz003
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz003
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
flume-server2.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz004
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz004
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
flume-server3.properties
#set Agent name
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1024000
agent.channels.c1.transactionCapacity = 10000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
# other node,nna to nns
agent.sources.r1.type = avro
agent.sources.r1.bind = ynjz005
agent.sources.r1.port = 52020
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type = static
agent.sources.r1.interceptors.i1.key = Collector
agent.sources.r1.interceptors.i1.value = ynjz005
agent.sources.r1.channels = c1
#set sink to hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.brokerList = ynjz003:9092,ynjz004:9092,ynjz005:9092,ynjz006:9092,ynjz007:9092,ynjz008:9092,ynjz009:9092
agent.sinks.k1.topic = flume-kafka-meijs33
agent.sinks.k1.serializer.class = kafka.serializer.StringEncoder
可以看出多个server配置的规律
3. 配置client,这里也配置一个client示例
flume-client.properties
#agent1 name
agent.channels = c1
agent.sources = r1
agent.sinks = k1 k2 k3 k4 k5 k6 k7
#set gruop
agent.sinkgroups = g1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 102400
agent.channels.c1.transactionCapacity = 1000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
agent.sources.r1.type = com.cbo.flume.source.zip.SpoolDirectorySource
agent.sources.r1.channels = c1
agent.sources.r1.spoolDir = /data/ynjz/workspace/zip
agent.sources.r1.fileHeader = true
agent.sources.r1.channels = c1
agent.sources.r1.flumeBatchSize=1000
agent.sources.r1.useFlumeEventFormat=false
agent.sources.r1.restart=true
agent.sources.r1.batchSize=1000
agent.sources.r1.batchTimeout=3000
agent.sources.r1.channels=c1
# set sink1
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz003
agent.sinks.k1.port = 52020
# set sink2
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz004
agent.sinks.k2.port = 52020
# set sink3
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz005
agent.sinks.k3.port = 52020
# set sink4
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz006
agent.sinks.k1.port = 52020
# set sink5
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz007
agent.sinks.k2.port = 52020
# set sink6
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz008
agent.sinks.k3.port = 52020
# set sink7
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz009
agent.sinks.k3.port = 52020
#set sink group
agent.sinkgroups.g1.sinks = k1 k2 k3 k4 k5 k6 k7
#set failover
agent.sinkgroups.g1.processor.type = failover
agent.sinkgroups.g1.processor.priority.k1 = 10
agent.sinkgroups.g1.processor.priority.k2 = 10
agent.sinkgroups.g1.processor.priority.k3 = 10
agent.sinkgroups.g1.processor.priority.k4 = 10
agent.sinkgroups.g1.processor.priority.k5 = 10
agent.sinkgroups.g1.processor.priority.k6 = 10
agent.sinkgroups.g1.processor.priority.k7 = 10
agent.sinkgroups.g1.processor.maxpenalty = 10000
这里需要注意sinkgroups配置,flume sinkgroups在常用的应用中有两种方式failover和load_balance,failover可以理解为容错机制,在上面的配置中sink只会往一个kafka写入数据,但一个kafka挂了,failover机制会立马选举一个出来,所以这里的容错机制很完善,但是应对大数据量会影响数据写入的能力,所以建议在大数据量的时候采用load_balance配置,下面时配置示例
#agent1 name
agent.channels = c1
agent.sources = r1
agent.sinks = k1 k2 k3 k4 k5 k6 k7
#set gruop
agent.sinkgroups = g1
#set channel
agent.channels.c1.type = memory
agent.channels.c1.capacity = 102400
agent.channels.c1.transactionCapacity = 24000
agent.channels.c1.byteCapacity=134217728
agent.channels.c1.byteCapacityBufferPercentage=80
agent.sources.r1.type = com.cbo.flume.source.zip.SpoolDirectorySource
agent.sources.r1.channels = c1
agent.sources.r1.spoolDir = /data/4G
agent.sources.r1.includePattern = ([^ ]*\.zip$)
agent.sources.r1.fileHeader = true
agent.sources.r1.channels = c1
agent.sources.r1.flumeBatchSize=10000
agent.sources.r1.useFlumeEventFormat=false
agent.sources.r1.restart=true
agent.sources.r1.batchSize=10000
agent.sources.r1.batchTimeout=3000
agent.sources.r1.channels=c1
# set sink1
agent.sinks.k1.channel = c1
agent.sinks.k1.type = avro
agent.sinks.k1.hostname = ynjz003
agent.sinks.k1.port = 52020
# set sink2
agent.sinks.k2.channel = c1
agent.sinks.k2.type = avro
agent.sinks.k2.hostname = ynjz004
agent.sinks.k2.port = 52020
# set sink3
agent.sinks.k3.channel = c1
agent.sinks.k3.type = avro
agent.sinks.k3.hostname = ynjz005
agent.sinks.k3.port = 52020
# set sink4
agent.sinks.k4.channel = c1
agent.sinks.k4.type = avro
agent.sinks.k4.hostname = ynjz006
agent.sinks.k4.port = 52020
# set sink5
agent.sinks.k5.channel = c1
agent.sinks.k5.type = avro
agent.sinks.k5.hostname = ynjz007
agent.sinks.k5.port = 52020
# set sink6
agent.sinks.k6.channel = c1
agent.sinks.k6.type = avro
agent.sinks.k6.hostname = ynjz008
agent.sinks.k6.port = 52020
# set sink7
agent.sinks.k7.channel = c1
agent.sinks.k7.type = avro
agent.sinks.k7.hostname = ynjz009
agent.sinks.k7.port = 52020
#set sink group
agent.sinkgroups.g1.sinks = k1 k2 k3 k4 k5 k6 k7
#set load_balance
agent.sinkgroups.g1.processor.type=load_balance
agent.sinkgroups.g1.processor.backoff=true
agent.sinkgroups.g1.processor.selector=random
在实际应用中多个client基本上一直,只有监控文件目录的配置不同即可agent.sources.r1.spoolDir = /data/4G
4. 启动flume-ng高可用集群
首先启动每个server,每个server只是配置文件flume-server-data.properties不同:
./bin/flume-ng agent --name agent --conf conf --conf-file conf/flume-server-data.properties -Dflume.root.logger=INFO,console > /data/ynjz/workspace/flume-server-data.log 2>&1 &
启动每个client,,每个server只是配置文件flume-client-data.properties不同:
./bin/flume-ng agent --name agent --conf conf --conf-file conf/flume-client-data.properties -Dflume.root.logger=INFO,console > /data/ynjz/workspace/flume-client-data.log 2>&1 &
在平时应用中,可以随时停止client,但停止了server没起而启动client会导致报错
Flume-ng高可用集群负载安装与配置的更多相关文章
- Flume 学习笔记之 Flume NG高可用集群搭建
Flume NG高可用集群搭建: 架构总图: 架构分配: 角色 Host 端口 agent1 hadoop3 52020 collector1 hadoop1 52020 collector2 had ...
- 大数据高可用集群环境安装与配置(09)——安装Spark高可用集群
1. 获取spark下载链接 登录官网:http://spark.apache.org/downloads.html 选择要下载的版本 2. 执行命令下载并安装 cd /usr/local/src/ ...
- Flume NG高可用集群搭建详解
.Flume NG简述 Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持 Failover和负载均 ...
- 大数据高可用集群环境安装与配置(06)——安装Hadoop高可用集群
下载Hadoop安装包 登录 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ 镜像站,找到我们要安装的版本,点击进去复制下载链接 ...
- 大数据高可用集群环境安装与配置(07)——安装HBase高可用集群
1. 下载安装包 登录官网获取HBase安装包下载地址 https://hbase.apache.org/downloads.html 2. 执行命令下载并安装 cd /usr/local/src/ ...
- 大数据高可用集群环境安装与配置(03)——设置SSH免密登录
Hadoop的NameNode需要启动集群中所有机器的Hadoop守护进程,这个过程需要通过SSH登录来实现 Hadoop并没有提供SSH输入密码登录的形式,因此,为了能够顺利登录每台机器,需要将所有 ...
- 大数据高可用集群环境安装与配置(08)——安装Ganglia监控集群
1. 安装依赖包和软件 在所有服务器上输入命令进行安装操作 yum install epel-release -y yum install ganglia-web ganglia-gmetad gan ...
- 大数据高可用集群环境安装与配置(02)——配置ntp服务
NTP服务概述 NTP服务器[Network Time Protocol(NTP)]是用来使计算机时间同步化的一种协议,它可以使计算机对其服务器或时钟源(如石英钟,GPS等等)做同步化,它可以提供高精 ...
- 大数据高可用集群环境安装与配置(10)——安装Kafka高可用集群
1. 获取安装包下载链接 访问https://kafka.apache.org/downloads 找到kafka对应版本 需要与服务器安装的scala版本一致(运行spark-shell可以看到当前 ...
随机推荐
- Zabbix监控磁盘IO值
iostat取硬盘IO值. iostat -x 3 2 | grep vdb | sed -n '2p' | awk '{print $14}' 每3s取一次值,输出第二次vdb硬盘的负载值. 添加Z ...
- opencv+codeblocks +anaconda
study from : https://www.jianshu.com/p/c16b7c870356 #include <cstdio> #include <cstdlib> ...
- maven deploy 指定-DaltDeploymentRepository
运行deploy出现如下错误: deployment failed repository element was not specified in the POM inside distributio ...
- 01-oracle学习环境配置
1.安装oracle与SQL Developer oracle10g安装教程 2.创建表空间以及用户 表空间是存储数据文件的容器,由数据文件组成,数据库的所有系统数据和用户数据都必须存储在数据文件中. ...
- Spring Boot 2.x以后static下面的静态资源被拦截
今天创建一个新的Spring Boot项目,没注意到spring boot的版本,发现静态资源无法访问.百度一下发现好像是Spring Boot 2.0版本以后static目录不能直接访问. 接下来直 ...
- [C++] const与指针的关系
首先快速复习一些基础. 考虑下面的声明兼定义式: int p = 10; p的基础数据类型是int. 考虑下面的声明兼定义式: const int a = 10; a的基础数据类型是int,a是一个常 ...
- FT View SE联合Studio 5000仿真
前言:一个实际的自动化项目,都是综合性的,不仅需要PLC进行逻辑.顺序.运动等控制,还需要在上位机进行监视和操作.当没有物理PLC时,上位机软件就无法连接到实际的变量数据,开发出来的界面和功能无法验 ...
- Nodejs一键实现微信内打开网页url自动跳转外部浏览器访问的功能
前言 现如今微信对第三方推广链接的审核是越来越严格了,域名在微信中分享转发经常会被拦截,一旦被拦截用户就只能复制链接手动打开浏览器粘贴才能访问,要不然就是换个域名再推,周而复始.无论是哪一种情况都会面 ...
- javaFX的控制台实现
最近做了个javaFX的工具,想弄个控制台输出信息,准备用TextArea来模拟console,但直接操纵console对象的话不依赖这个项目的地方就无法输出信息到控制台了,至于log,以前弄过一个输 ...
- vuecli3 项目添加配置文件以及使用@映射、代理
在根目录下新建 vue.config.js 1.vue.config.js中配置路径别名方法 // vue.config.js module.exports = { configureWebpack: ...