flume中sink到hdfs，文件系统频繁产生文件和出现乱码，文件滚动配置不起作用？

　　问题描述

　解决办法

　　先把这个hdfs目录下的数据删除。并修改配置文件flume-conf.properties，重新采集。

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#  http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing,

# software distributed under the License is distributed on an

# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

# KIND, either express or implied.  See the License for the

# specific language governing permissions and limitations

# under the License.

# The configuration file needs to define the sources,

# the channels and the sinks.

# Sources, channels and sinks are defined per agent,

# in this case called 'agent'

agent1.sources = spool-source1

agent1.sinks = hdfs-sink1

agent1.channels = ch1

#Define and configure an Spool directory source

agent1.sources.spool-source1.channels=ch1

agent1.sources.spool-source1.type=spooldir

agent1.sources.spool-source1.spoolDir=/home/hadoop/data/flume/sqooldir//--/

agent1.sources.spool-source1.ignorePattern=event(_\d{}\-d{}\-d{}\_d{}\_d{})?\.log(\.COMPLETED)?

agent1.sources.spool-source1.deserializer.maxLineLength=

#Configure channel

agent1.channels.ch1.type = file

agent1.channels.ch1.checkpointDir = /home/hadoop/data/flume/checkpointDir

agent1.channels.ch1.dataDirs = /home/hadoop/data/flume/dataDirs

#Define and configure a hdfs sink

agent1.sinks.hdfs-sink1.channel = ch1

agent1.sinks.hdfs-sink1.type = hdfs

agent1.sinks.hdfs-sink1.hdfs.path = hdfs://master:9000/flume/%Y%m%d

agent1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true

agent1.sinks.hdfs-sink1.hdfs.rollInterval =

agent1.sinks.hdfs-sink1.hdfs.rollSize =

agent1.sinks.hdfs-sink1.hdfs.rollCount =

agent1.sinks.hdfs-sink1.hdfs.minBlockReplicas=

agent1.sinks.hdfs-sink1.hdfs.idleTimeout=

#agent1.sinks.hdfs-sink1.hdfs.codeC = snappy

agent1.sinks.hdfs-sink1.hdfs.fileType=DataStream

#agent1.sinks.hdfs-sink1.hdfs.writeFormat=Text

# For each one of the sources, the type is defined

#agent.sources.seqGenSrc.type = seq

# The channel can be defined as follows.

#agent.sources.seqGenSrc.channels = memoryChannel

# Each sink's type must be defined

#agent.sinks.loggerSink.type = logger

#Specify the channel the sink should use

#agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.

#agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

#agent.channels.memoryChannel.capacity =

　　教大家一招：大家在这些如flume的配置文件，最好还是去看官网，学会扩展，别只局限于别人的博客的文档，当然可以作为参考。关键还是来源于官方！

[hadoop@master sqooldir]$ $HADOOP_HOME/bin/hadoop fs -rm -r /flume/

// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

// :: INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval =  minutes, Emptier interval =  minutes.

Deleted /flume/

[hadoop@master sqooldir]$

　　重新开启flume

[hadoop@master flume]$ pwd

/home/hadoop/app/flume

[hadoop@master flume]$ bin/flume-ng agent -n agent1 -f conf/flume-conf.properties

　　如果你的问题，还有副本数的问题，自行去解决。将$HADOOP_HOME/etc/hadoop/下的hdfs-site.xml的属性（master、slave1和slave2都要修改）

<property>

                <name>dfs.replication</name>

                <value></value>

                <description>Set to  for pseudo-distributed mode,Set to  for distributed mode,Set to  for distributed mode.</description>

 </property>

　　记得重启hadoop集群。

flume中sink到hdfs，文件系统频繁产生文件和出现乱码，文件滚动配置不起作用？的更多相关文章

flume中sink到hdfs，文件系统频繁产生文件，文件滚动配置不起作用？
在测试hdfs的sink,发现sink端的文件滚动配置项起不到任何作用,配置如下: a1.sinks.k1.type=hdfs a1.sinks.k1.channel=c1 a1.sinks.k1.h ...
HDFS文件系统上传时序图 PB级文件存储时序图
自己设计的时序图. 来自为知笔记(Wiz)
大数据学习笔记之Hadoop（二）：HDFS文件系统
文章目录一 HDFS概念 1.1 概念 1.2 组成 1.3 HDFS 文件块大小二 HFDS命令行操作三 HDFS客户端操作 3.1 eclipse环境准备 3.1.1 jar包准备 3.2 ...
Flume中的HDFS Sink配置参数说明【转】
转:http://lxw1234.com/archives/2015/10/527.htm 关键字:flume.hdfs.sink.配置参数 Flume中的HDFS Sink应该是非常常用的,其中的配 ...
Flume监听文件目录sink至hdfs配置
一:flume介绍 Flume是一个分布式.可靠.和高可用的海量日志聚合的系统,支持在系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能 ...
Flume实时监控目录sink到hdfs，再用sparkStreaming监控hdfs的这个目录，对数据进行计算
目标:Flume实时监控目录sink到hdfs,再用sparkStreaming监控hdfs的这个目录,对数据进行计算 1.flume的配置,配置spoolDirSource_hdfsSink.pro ...
在Spark shell中基于HDFS文件系统进行wordcount交互式分析
Spark是一个分布式内存计算框架,可部署在YARN或者MESOS管理的分布式系统中(Fully Distributed),也可以以Pseudo Distributed方式部署在单个机器上面,还可以以 ...
我理解中的Hadoop HDFS分布式文件系统
一,什么是分布式文件系统,分布式文件系统能干什么在学习一个文件系统时,首先我先想到的是,学习它能为我们提供什么样的服务,它的价值在哪里,为什么要去学它.以这样的方式去理解它之后在日后的深入学习中才能 ...
将存储在本地的大量分散的小文件，合并并保存在hdfs文件系统中
import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java ...

随机推荐

Andoid 更好的Android多线程下载框架
概述为什么是更好的Android多线程下载框架呢,原因你懂的,广告法嘛! 本篇我们我们就来聊聊多线程下载框架,先聊聊我们框架的特点: 多线程多任务断点续传支持大文件可以自定义下载数据库高度 ...
浅谈贝塞尔曲线以及iOS中粘性动画的实现
关于贝塞尔曲线,网上相关的文章很多,这里我主要想用更简单的方法让大家理解贝塞尔曲线,当然,这仅仅是我个人的理解,如有错误的地方还请大家能够帮忙指出来,这样大家才能一起进步. 贝塞尔曲线,常用到的可分为 ...
PostgreSQL Replication之第五章设置同步复制（3）
5.3 冗余和停止复制谈到同步复制,有一个现象一定不能被遗漏.想象一下,我们有一个同步复制的双节点集群.如果slave故障会发生什么?答案是master不能容易地区分慢slave和故障slave,因 ...
Goldengate参数规范
1. 文档综述 1.1. 文档说明本文档规定了在实施Goldengate时,各个进程需要配置的参数. 该参数模板适合于Goldengate11.2.1.0版本: **注:本文档为Golden ...
mysql每个表总的索引大小
/* 指定的数据库每个表的索引不包含主键索引的大小*/ ,),,),'mb') as index_size from information_schema.tables where TABLE_S ...
【codeforces 821E】Okabe and El Psy Kongroo
[题目链接]:http://codeforces.com/problemset/problem/821/E [题意] 一开始位于(0,0)的位置; 然后你每次可以往右上,右,右下3走一步; (x+1, ...
struts2文件上传需要注意的
① 必须封装三个字段:文件.文件类型.文件名,而且这三个字段的名字的前面几个字母是一样的如: private File upload; private String uploadContentTyp ...
《SAS编程与数据挖掘商业案例》学习笔记之十八
接着曾经的<SAS编程与数据挖掘商业案例>,之前全是sas的基础知识,如今開始进入数据挖掘方面笔记,本文主要介绍数据挖掘基本流程以及应用方向,并以logistic回归为例说明. 一:数据挖 ...
android 推断是否支持闪光灯
近期在做录制视频功能,在找一些资料时发现要推断是否支持闪关灯,在这记录下来,怕以后忘记 public static boolean isSupportCameraLedFlash(PackageMa ...
ISheet ICell
/// <summary> /// Gets the first row on the sheet /// </summary> /// <value>the nu ...

flume中sink到hdfs，文件系统频繁产生文件和出现乱码，文件滚动配置不起作用？

flume中sink到hdfs，文件系统频繁产生文件和出现乱码，文件滚动配置不起作用？的更多相关文章

随机推荐

热门专题