flume hdfs一些简单配置记忆

 ############################################

 #  producer config

 ############################################

 #agent section

 producer.sources = s

 producer.channels = c c1 c2

 producer.sinks = r h es

 #source section

 producer.sources.s.type =exec

 producer.sources.s.command = tail -f /usr/local/nginx/logs/test1.log

 #producer.sources.s.type = spooldir

 #producer.sources.s.spoolDir = /usr/local/nginx/logs/

 #producer.sources.s.fileHeader = true

 producer.sources.s.channels = c c1 c2

 producer.sources.s.interceptors = i

 #不支持忽略大小写

 producer.sources.s.interceptors.i.regex = .*\.(css|js|jpg|jpeg|png|gif|ico).*

 producer.sources.s.interceptors.i.type = org.apache.flume.interceptor.RegexFilteringInterceptor$Builder

 #不包含

 producer.sources.s.interceptors.i.excludeEvents = true

 ############################################

 #   hdfs config

 ############################################

 producer.channels.c.type = memory

 #Timeout in seconds for adding or removing an event

 producer.channels.c.keep-alive= 30

 producer.channels.c.capacity = 10000

 producer.channels.c.transactionCapacity = 10000

 producer.channels.c.byteCapacityBufferPercentage = 20

 producer.channels.c.byteCapacity = 800000

 producer.sinks.r.channel = c

 producer.sinks.r.type = avro

 producer.sinks.r.hostname  = 127.0.0.1

 producer.sinks.r.port = 10101

 ############################################

 #   hdfs config

 ############################################

 producer.channels.c1.type = memory

 #Timeout in seconds for adding or removing an event

 producer.channels.c1.keep-alive= 30

 producer.channels.c1.capacity = 10000

 producer.channels.c1.transactionCapacity = 10000

 producer.channels.c1.byteCapacityBufferPercentage = 20

 producer.channels.c1.byteCapacity = 800000

 producer.sinks.h.channel = c1

 producer.sinks.h.type = hdfs

 #目录位置

 producer.sinks.h.hdfs.path = hdfs://127.0.0.1/tmp/flume/%Y/%m/%d

 #文件前缀

 producer.sinks.h.hdfs.filePrefix=nginx-%Y-%m-%d-%H

 producer.sinks.h.hdfs.fileType = DataStream

 #时间类型必加，不然会报错

 producer.sinks.h.hdfs.useLocalTimeStamp = true

 producer.sinks.h.hdfs.writeFormat = Text

 #hdfs创建多长时间新建文件，0不基于时间

 #Number of seconds to wait before rolling current file (0 = never roll based on time interval)

 producer.sinks.h.hdfs.rollInterval=0

 hdfs多大时新建文件，0不基于文件大小

 #File size to trigger roll, in bytes (0: never roll based on file size)

 producer.sinks.h.hdfs.rollSize = 0

 #hdfs有多少条消息时新建文件，0不基于消息个数

 #Number of events written to file before it rolled (0 = never roll based on number of events)

 producer.sinks.h.hdfs.rollCount = 0

 #批量写入hdfs的个数

 #number of events written to file before it is flushed to HDFS

 producer.sinks.h.hdfs.batchSize=1000

 #flume操作hdfs的线程数（包括新建，写入等）

 #Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)

 producer.sinks.h.hdfs.threadsPoolSize=15

 #操作hdfs超时时间

 #Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring.

 producer.sinks.h.hdfs.callTimeout=30000

hdfs.round	false	Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
hdfs.roundValue	1	Rounded down to the highest multiple of this (in the unit configured using `hdfs.roundUnit`), less than current time.
hdfs.roundUnit	second	The unit of the round down value - `second`, `minute` or `hour`.

 ############################################

 #   elasticsearch config

 ############################################

 producer.channels.c2.type = memory

 #Timeout in seconds for adding or removing an event

 producer.channels.c2.keep-alive= 30

 producer.channels.c2.capacity = 10000

 producer.channels.c2.transactionCapacity = 10000

 producer.channels.c2.byteCapacityBufferPercentage = 20

 producer.channels.c2.byteCapacity = 800000

 producer.sinks.es.channel = c2

 producer.sinks.es.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink

 producer.sinks.es.hostNames = 127.0.0.1:9300

 #Name of the ElasticSearch cluster to connect to

 producer.sinks.es.clusterName = sunxucool

 #Number of events to be written per txn.

 producer.sinks.es.batchSize = 1000

 #The name of the index which the date will be appended to. Example ‘flume’ -> ‘flume-yyyy-MM-dd’

 producer.sinks.es.indexName = flume_es

 #The type to index the document to, defaults to ‘log’

 producer.sinks.es.indexType = test

 producer.sinks.es.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

flume hdfs一些简单配置记忆的更多相关文章

Flume初入门简单配置与使用
1.Flume在集群中扮演的角色 Flume.Kafka用来实时进行数据收集,Spark.Storm用来实时处理数据,impala用来实时查询. 2.Flume框架简介 1.1 Flume提供一个分布 ...
Flume + HDFS + Hive日志收集系统
最近一段时间,负责公司的产品日志埋点与收集工作,搭建了基于Flume+HDFS+Hive日志搜集系统. 一.日志搜集系统架构: 简单画了一下日志搜集系统的架构图,可以看出,flume承担了agent与 ...
flume从kafka读取数据到hdfs中的配置
#source的名字 agent.sources = kafkaSource # channels的名字,建议按照type来命名 agent.channels = memoryChannel # si ...
[bigdata] 使用Flume hdfs sink， hdfs文件未关闭的问题
现象: 执行mapreduce任务时失败通过hadoop fsck -openforwrite命令查看发现有文件没有关闭. [root@com ~]# hadoop fsck -openforwri ...
Flume的安装与配置
Flume的安装与配置一. 资源下载资源地址:http://flume.apache.org/download.html 程序地址:http://apache.fayea.com/fl ...
kafka+flume+HDFS日志采集项目框架
1,项目图如下: 2, 实现过程启动HDFS: sbin/start-dfs.sh 启动zookeeper(三台): bin/zkServer.sh start 启动kafka(三台): root@ ...
使用QJM实现HDFS的HA配置
使用QJM实现HDFS的HA配置 1.背景 hadoop 2.0.0之前,namenode存在单点故障问题(SPOF,single point of failure),如果主机或进程不可用时,整个集群 ...
小丁带你走进git世界一-git简单配置
小丁带你走进git世界一-git简单配置 1.github的简单配置配置提交代码的信息,例如是谁提交的代码之类的. git config –global user.name BattleHeaer ...
以实际的WebGIS例子探讨Nginx的简单配置
文章版权由作者李晓晖和博客园共有,若转载请于明显处标明出处:http://www.cnblogs.com/naaoveGIS/ 1.背景以实际项目中的一个例子来详细讲解Nginx中的一般配置,其中涉 ...

随机推荐

Ubuntu16.04中安装stlink驱动
系统环境: Vmware12, Ubuntu16.04 Stlink version:v1.4.0 一.安装依赖包: sudo apt-get install libusb-1.0 sudo apt- ...
vultr vs digitalocean vs linode
vultr官方网站:www.vultr.comdigitalocean官方网站:www.digitalocean.comlinode官方网站:www.linode.com 一般来说我们买VPS的时候都 ...
Android字体简述
Android是一个典型的Linux内核的操作系统.在Android系统中,主要有DroidSans和DroidSerif两大字体阵营,从名字就可以看出来,前者是无衬线字体,后者是衬线字体.具体来说, ...
Clever Little Box 电缆组件 USB A 插座至 USB B 插头
http://china.rs-online.com/web/p/usb-cable-assemblies/7244156/ 产品详细信息 USB3.0适配器 superspeed USB将提供10x ...
安装wp8sdk 当前系统时钟或签名文件中的时间戳验证时要求的证书不在有效期内。
安装wp8sdk 当前系统时钟或签名文件中的时间戳验证时要求的证书不在有效期内. [1404:0090][2015-06-12T08:00:53]: Error 0x800b0101: Failed ...
分析oracle索引空间使用情况，以及索引是否须要重建
分析索引空间使用情况.以及索引是否须要重建分析其它用户下的索引须要 analyze any的权限分析索引前先查看表的大小和索引的大小,假设索引大小和表大小一样大或者大于表的大小,那么能够推断索引可 ...
.NET：CLR via C# Compute-Bound Asynchronous Operations
线程槽使用线程池了以后就不要使用线程槽了,当线程池执行完调度任务后,线程槽的数据还在. 测试代码 using System; using System.Collections.Generic; us ...
pytest文档17-fixture之autouse=True
前言平常写自动化用例会写一些前置的fixture操作,用例需要用到就直接传该函数的参数名称就行了.当用例很多的时候,每次都传这个参数,会比较麻烦. fixture里面有个参数autouse,默认是F ...
jsp页面传递参数是如何与javabean进行关联的
总结:1.severlet容器是通过JavaBean中的属性方法名来获取属性名的,然后根据此属性名来从request中取值 2.JavaBean中属性方法的命名,set后的名称要与你从request中 ...
用emoji表情包来可视化北京市历史天气状况！
用emoji表情包来可视化北京市历史天气状况! 最近有了一个突如其来的想法,主要是看到了R社区有大神做了emoji表情包,并已经打通了ggplot的链接,所以想用ggplot结合emoji表情做一 ...

flume hdfs一些简单配置记忆

flume hdfs一些简单配置记忆的更多相关文章

随机推荐

热门专题