flume sink核心类结构

1 核心接口Sink

org.apache.flume.Sink

  /**

   * <p>Requests the sink to attempt to consume data from attached channel</p>

   * <p><strong>Note</strong>: This method should be consuming from the channel

   * within the bounds of a Transaction. On successful delivery, the transaction

   * should be committed, and on failure it should be rolled back.

   * @return READY if 1 or more Events were successfully delivered, BACKOFF if

   * no data could be retrieved from the channel feeding this sink

   * @throws EventDeliveryException In case of any kind of failure to

   * deliver data to the next hop destination.

   */

  public Status process() throws EventDeliveryException;

  public static enum Status {

    READY, BACKOFF

  }

process为核心接口，返回值为状态，只有两个：ready和backoff，调用方会根据返回值做相应处理，后边会看到；
这个接口也是扩展flume sink需要实现的接口，比如KuduSink；

2 Sink封装

org.apache.flume.SinkProcessor

/**

 * <p>

 * Interface for a device that allows abstraction of the behavior of multiple

 * sinks, always assigned to a SinkRunner

 * </p>

 * <p>

 * A sink processors {@link SinkProcessor#process()} method will only be

 * accessed by a single runner thread. However configuration methods

 * such as {@link Configurable#configure} may be concurrently accessed.

 *

 * @see org.apache.flume.Sink

 * @see org.apache.flume.SinkRunner

 * @see org.apache.flume.sink.SinkGroup

 */

public interface SinkProcessor extends LifecycleAware, Configurable {

  /**

   * <p>Handle a request to poll the owned sinks.</p>

   *

   * <p>The processor is expected to call {@linkplain Sink#process()} on

   *  whatever sink(s) appropriate, handling failures as appropriate and

   *  throwing {@link EventDeliveryException} when there is a failure to

   *  deliver any events according to the delivery policy defined by the

   *  sink processor implementation. See specific implementations of this

   *  interface for delivery behavior and policies.</p>

   *

   * @return Returns {@code READY} if events were successfully consumed,

   * or {@code BACKOFF} if no events were available in the channel to consume.

   * @throws EventDeliveryException if the behavior guaranteed by the processor

   * couldn't be carried out.

   */

  Status process() throws EventDeliveryException;

这个类负责封装单个sink或者sink group的处理，常用的子类有：

1）单个sink

org.apache.flume.sink.DefaultSinkProcessor

  @Override

  public Status process() throws EventDeliveryException {

    return sink.process();

  }

DefaultSinkProcessor的process会直接调用内部sink的process；

2）sink group

org.apache.flume.sink.LoadBalancingSinkProcessor
org.apache.flume.sink.FailoverSinkProcessor.FailedSink

3 sink的调用方为SinkRunner

org.apache.flume.SinkRunner

/**

 * <p>

 * A driver for {@linkplain Sink sinks} that polls them, attempting to

 * {@linkplain Sink#process() process} events if any are available in the

 * {@link Channel}.

 * </p>

 *

 * <p>

 * Note that, unlike {@linkplain Source sources}, all sinks are polled.

 * </p>

 *

 * @see org.apache.flume.Sink

 * @see org.apache.flume.SourceRunner

 */

public class SinkRunner implements LifecycleAware {

...

  private static final long backoffSleepIncrement = 1000;

  private static final long maxBackoffSleep = 5000;

org.apache.flume.SinkRunner.PollingRunner

  public static class PollingRunner implements Runnable {

    private SinkProcessor policy;

    private AtomicBoolean shouldStop;

    private CounterGroup counterGroup;

    @Override

    public void run() {

      logger.debug("Polling sink runner starting");

      while (!shouldStop.get()) {

        try {

          if (policy.process().equals(Sink.Status.BACKOFF)) {

            counterGroup.incrementAndGet("runner.backoffs");

            Thread.sleep(Math.min(

                counterGroup.incrementAndGet("runner.backoffs.consecutive")

                * backoffSleepIncrement, maxBackoffSleep));

          } else {

            counterGroup.set("runner.backoffs.consecutive", 0L);

          }

        } catch (InterruptedException e) {

          logger.debug("Interrupted while processing an event. Exiting.");

          counterGroup.incrementAndGet("runner.interruptions");

        } catch (Exception e) {

          logger.error("Unable to deliver event. Exception follows.", e);

          if (e instanceof EventDeliveryException) {

            counterGroup.incrementAndGet("runner.deliveryErrors");

          } else {

            counterGroup.incrementAndGet("runner.errors");

          }

          try {

            Thread.sleep(maxBackoffSleep);

          } catch (InterruptedException ex) {

            Thread.currentThread().interrupt();

          }

        }

      }

      logger.debug("Polling runner exiting. Metrics:{}", counterGroup);

    }

  }

无论process返回backoff或者抛exception，都会sleep一段时间，所以flume的sink一旦遇到大量异常数据或者自定义sink返回backoff，都会非常慢；

【原创】大数据基础之Flume（2）Sink代码解析的更多相关文章

【原创】大数据基础之Flume（2）kudu sink
kudu中的flume sink代码路径: https://github.com/apache/kudu/tree/master/java/kudu-flume-sink kudu-flume-sin ...
【原创】大数据基础之Flume（2）应用之kafka-kudu
应用一:kafka数据同步到kudu 1 准备kafka topic # bin/kafka-topics.sh --zookeeper $zk:2181/kafka -create --topic ...
【原创】大数据基础之Zookeeper（2）源代码解析
核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...
大数据系列之Flume+kafka 整合
相关文章: 大数据系列之Kafka安装大数据系列之Flume--几种不同的Sources 大数据系列之Flume+HDFS 关于Flume 的一些核心概念: 组件名称功能介绍 Agent ...
【原创】大数据基础之词频统计Word Count
对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test ...
【原创】大数据基础之Impala（1）简介、安装、使用
impala2.12 官方:http://impala.apache.org/ 一简介 Apache Impala is the open source, native analytic datab ...
【原创】大数据基础之Benchmark（2）TPC-DS
tpc 官方:http://www.tpc.org/ 一简介 The TPC is a non-profit corporation founded to define transaction pr ...
【原创】大数据基础之Spark（5）Shuffle实现原理及代码解析
一简介 Shuffle,简而言之,就是对数据进行重新分区,其中会涉及大量的网络io和磁盘io,为什么需要shuffle,以词频统计reduceByKey过程为例, serverA:partition ...
【原创】大数据基础之Spark（4）RDD原理及代码解析
一简介 spark核心是RDD,官方文档地址:https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-di ...

随机推荐

LINQ to SQL 中 Concat、Union、Intersect、Except 方法的使用
Ø 前言 LINQ to SQL 中需要对两个或多个数据集进行操作,比如:合并.取交集等,主要使用下面四个方法,这四个方法都是 System.Linq.IQueryable<out T> ...
简单配置，让ES6脚本在浏览器里飞
如果你只是想学习ES6语法,找个地方练习下写法.不想看环境如何搭配,就想简单的学习,那有两种简单的方式. 1.在Chrome浏览器里直接F12调出控制台 2.在浏览器里跑引用ES6的HTML页面 ...
alexnet- tensorflow
alexnet 在 imagenet上夺冠是卷积神经网络如今这么火热的起点. 虽然卷积神经网络很早就被提出来,但是由于计算能力和各方面原因,没有得到关注. alexnet 为什么能取得这么好的成绩,它 ...
NPOI 列宽自适应代码示例
//列宽自适应,只对英文和数字有效 for (int i = 0; i <= maxColumn; i++) { sheet.AutoSizeColumn(i); } //获取当前列的宽度,然后 ...
Linux(Ubuntu)下安装jdk
一.下载 1)可以去官网下载:http://www.oracle.com/technetwork/java/javase/downloads/ea-jsp-142245.html,比较多,眼花~~· ...
Javascript入门(三)函数
Javascript函数一.函数定义与执行 <script type="text/javascript"> //define function fun1(){ ale ...
G - Galactic Collegiate Programming Contest Kattis - gcpc （set使用）
题目链接: G - Galactic Collegiate Programming Contest Kattis - gcpc 题目大意:当前有n个人,一共有m次提交记录,每一次的提交包括两个数,st ...
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields（翻译）
0 - Abstract 我们提出了一种方法去在一张图片中有效地识别多个人体的2D姿势.这个方法使用了一个无参数表示法,我们将其叫为Part Affinity Fields(PAFs),其是去在图片中 ...
TensorFlow架构学习
0 - TensorFlow 基于数据流图,节点表示某种抽象计算,边表示节点之间联系的张量. Tensorflow结构灵活,能够支持各种网络模型,有良好的通用性和扩展性. 1 - 系统概述 Tenso ...
MySql 在cmd下的学习笔记 —— 有关用户权限的操作（grant）
用户连接到MySQL时: [用户] <----> [服务器] 分为2个阶段: 1:有没有权限连接: 2:有没有执行此操作的权利.(如select, update……) 判断依据:( ...

【原创】大数据基础之Flume（2）Sink代码解析

1 核心接口Sink

2 Sink封装

3 sink的调用方为SinkRunner

【原创】大数据基础之Flume（2）Sink代码解析的更多相关文章

随机推荐

热门专题