Flume环境搭建_五种案例

http://flume.apache.org/FlumeUserGuide.html

A simple example

Here, we give an example configuration file, describing a single-node Flume deployment. This configuration lets a user generate events and subsequently logs them to the console.

# example.conf: A single-node Flume configuration

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the various components, then describes their types and configuration parameters. A given configuration file might define several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.

Given this configuration file, we can start Flume as follows:

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

Note that in a full deployment we would typically include one more option: --conf=<conf-dir>. The <conf-dir> directory would include a shell script flume-env.sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.

From a separate terminal, we can then telnet port 44444 and send Flume an event:

$ telnet localhost 44444

Trying 127.0.0.1...

Connected to localhost.localdomain (127.0.0.1).

Escape character is '^]'.

Hello world! <ENTER>

OK

The original Flume terminal will output the event in a log message.

12/06/19 15:32:19 INFO source.NetcatSource: Source starting

12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }

Congratulations - you’ve successfully configured and deployed a Flume agent! Subsequent sections cover agent configuration in much more detail.

以下为具体搭建流程

Flume搭建_案例一：单个Flume

安装node2上

1. 上传到/home/tools，解压，解压后移动到/home下

2. 重命名，并修改flume-env.sh

vi flume-env.sh

3. 配置Flume的环境变量

vi /etc/profile

source /etc/profile

查看Flume的版本，看Flume的环境变量是否配置成功

4. 在/home下创建tests_flume, 并创建flume配置文件

cd test_flume

vi flume1

5. 命令测试Flume是否安装成功

flume-ng agent --conf /home/test_flume --conf-file /home/test_flume/flume1 --name a1 -Dflume.root.logger=INFO,console

安装telnet

随意输入

hi flume

切换窗口查看

退出 ctrl+] quit

Flume搭建_案例二：两个Flume做集群

安装node1，node2上

MemoryChanel配置
capacity：默认该通道中最大的可以存储的event数量是100，
trasactionCapacity：每次最大可以从source中拿到或者送到sink中的event数量也是100
keep-alive：event添加到通道中或者移出的允许时间
byte**：即event的字节量的限制，只包括eventbody

1. node1,node2，上传压缩包到/home/tools下，解压，

修改conf下的flume-env.sh中的java环境变量，

在/etc/profile下

配置Flume的环境变量

node1,node2下创建测试目录test_flume，并分别在node1,node2下创建配置文件——flume21，flume22

node1下创建flume21

node2下创建flume22

5. node1,node2分别启动flume(注意因为node2在后面，所以先启动node2中flume,再启动node1中flume)

先启动node02的Flume
flume-ng agent -n a1 -c conf -f avro.conf -Dflume.root.logger=INFO,console
flume-ng agent -n a1 -c conf -f /home/test_flume/flume22 -Dflume.root.logger=INFO,console
再启动node01的Flume
flume-ng agent -n a1 -c conf -f simple.conf2 -Dflume.root.logger=INFO,console
flume-ng agent -n a1 -c conf -f /home/test_flume/flume21 -Dflume.root.logger=INFO,console

node2:

node1:

6. 打开telnet测试，node2输出结果

Flume搭建_案例三：如何监控一个文件的变化？

安装node2上

1. node2，上传压缩包到/home/tools下，解压，

修改conf下的flume-env.sh中的java环境变量，

在/etc/profile下

配置Flume的环境变量

node2下创建测试目录test_flume，node2下创建配置文件——flume3

mkdir test_flume

vi flume3

5. node2启动flume

启动Flume
flume-ng agent -n a1 -c conf -f exec.conf -Dflume.root.logger=INFO,console
flume-ng agent -n a1 -c conf -f /home/test_flume/flume3 -Dflume.root.logger=INFO,console

6. 测试

在/home/test_flume下创建空文件演示 touch flume.exec.log

循环添加数据

for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done

Flume搭建_案例四：如何监控一个文件：目录的变化？

安装node2上

1. node2，上传压缩包到/home/tools下，解压，

修改conf下的flume-env.sh中的java环境变量，

在/etc/profile下

配置Flume的环境变量

node2下创建测试目录test_flume，node2下创建配置文件——flume4

mkdir test_flume

vi flume4

5. node2启动flume

6. 测试

Flume搭建_案例五：如何定义一个HDFS类型的Sink?

安装node2上

Flume搭建_案例五_配置项解读

1. Flume中日期的格式

什么时候会用？

Flume收集的时候根据时间来创建，比如今天的产生的数据就创建20170216，昨天的就放在20170215下

！注意

2. Flume是如何找到HDFS？

Flume如果配置的是hdfs,它会根据系统中配置的环境变量去找

3. Flume什么时候滚动生成新文件？

滚动的间隔，大小，数量

hdfs.rollInterval	30	Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize	1024	File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount	10	Number of events written to file before it rolled (0 = never roll based on number of events)

4. 多长时间没有操作，Flume将一个临时文件生成新文件？

hdfs.idleTimeout

Timeout after which inactive files get closed (0 = disable automatic closing of idle files)

5. 多长时间生成一个新的目录？（比如每10s生成一个新的目录）

四舍五入，没有五入，只有四舍

(比如57分划分为55分，5,6,7,8,9在一个目录，10,11,12,13,14在一个目录)

hdfs.round	false	Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
hdfs.roundValue	1	Rounded down to the highest multiple of this (in the unit configured using `hdfs.roundUnit`), less than current time.
hdfs.roundUnit	second	The unit of the round down value - `second`, `minute` or `hour`.

1. node2，上传压缩包到/home/tools下，解压，

修改conf下的flume-env.sh中的java环境变量，

在/etc/profile下

配置Flume的环境变量

node2下创建测试目录test_flume，node2下创建配置文件——flume5

mkdir test_flume

vi flume5

5. node2启动flume

6. 测试

查看hdfs文件

hadoop fs -ls /flume/...

hadoop fs -get /flume/...