Storm-源码分析- spout (backtype.storm.spout)
1. ISpout接口
ISpout作为实现spout的核心interface, spout负责feeding message, 并且track这些message.
如果需要Spout track发出的message, 必须给出message-id, 这个message-id可以是任意类型, 但是如果不指定或将message-id置空, storm就不会track这个message
必须要注意的是, spout线程会在一个线程中调用ack, fail, nextTuple, 所以不用考虑互斥, 但是也要这些function中, 避免任意的block
/**
* ISpout is the core interface for implementing spouts. A Spout is responsible
* for feeding messages into the topology for processing. For every tuple emitted by
* a spout, Storm will track the (potentially very large) DAG of tuples generated
* based on a tuple emitted by the spout. When Storm detects that every tuple in
* that DAG has been successfully processed, it will send an ack message to the Spout.
*
* <p>If a tuple fails to be fully process within the configured timeout for the
* topology (see {@link backtype.storm.Config}), Storm will send a fail message to the spout
* for the message.</p>
*
* <p> When a Spout emits a tuple, it can tag the tuple with a message id. The message id
* can be any type. When Storm acks or fails a message, it will pass back to the
* spout the same message id to identify which tuple it's referring to. If the spout leaves out
* the message id, or sets it to null, then Storm will not track the message and the spout
* will not receive any ack or fail callbacks for the message.</p>
*
* <p>Storm executes ack, fail, and nextTuple all on the same thread. This means that an implementor
* of an ISpout does not need to worry about concurrency issues between those methods. However, it
* also means that an implementor must ensure that nextTuple is non-blocking: otherwise
* the method could block acks and fails that are pending to be processed.</p>
*/
public interface ISpout extends Serializable {
/**
* Called when a task for this component is initialized within a worker on the cluster.
* It provides the spout with the environment in which the spout executes.
*
* <p>This includes the:</p>
*
* @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.
* @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
* @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
*/
void open(Map conf, TopologyContext context, SpoutOutputCollector collector); /**
* Called when an ISpout is going to be shutdown. There is no guarentee that close
* will be called, because the supervisor kill -9's worker processes on the cluster.
*
* <p>The one context where close is guaranteed to be called is a topology is
* killed when running Storm in local mode.</p>
*/
void close(); /**
* Called when a spout has been activated out of a deactivated mode.
* nextTuple will be called on this spout soon. A spout can become activated
* after having been deactivated when the topology is manipulated using the
* `storm` client.
*/
void activate(); /**
* Called when a spout has been deactivated. nextTuple will not be called while
* a spout is deactivated. The spout may or may not be reactivated in the future.
*/
void deactivate(); /**
* When this method is called, Storm is requesting that the Spout emit tuples to the
* output collector. This method should be non-blocking, so if the Spout has no tuples
* to emit, this method should return. nextTuple, ack, and fail are all called in a tight
* loop in a single thread in the spout task. When there are no tuples to emit, it is courteous
* to have nextTuple sleep for a short amount of time (like a single millisecond)
* so as not to waste too much CPU.
*/
void nextTuple(); /**
* Storm has determined that the tuple emitted by this spout with the msgId identifier
* has been fully processed. Typically, an implementation of this method will take that
* message off the queue and prevent it from being replayed.
*/
void ack(Object msgId); /**
* The tuple emitted by this spout with the msgId identifier has failed to be
* fully processed. Typically, an implementation of this method will put that
* message back on the queue to be replayed at a later time.
*/
void fail(Object msgId);
2. SpoutOutputCollector
用于expose spout发送(emit) tuples的接口
和bolt的output collector相比, spout的output collector可以指定message-id, 用于spout track该message
emit
List<Integer> emit(String streamId, List<Object> tuple, Object messageId)
emit, 3个参数, 发送到的streamid, tuple, 和message-id
如果streamid为空, 则发送到默认stream, Utils.DEFAULT_STREAM_ID
如果messageid为空, 则spout不会track this message
1个返回值, 最终发送到的task ids
emitDirect
void emitDirect(int taskId, String streamId, List<Object> tuple, Object messageId)
directgrouping, 直接通过taskid指定发送的task
/**
* This output collector exposes the API for emitting tuples from an {@link backtype.storm.topology.IRichSpout}.
* The main difference between this output collector and {@link OutputCollector}
* for {@link backtype.storm.topology.IRichBolt} is that spouts can tag messages with ids so that they can be
* acked or failed later on. This is the Spout portion of Storm's API to
* guarantee that each message is fully processed at least once.
*/
public class SpoutOutputCollector implements ISpoutOutputCollector {
ISpoutOutputCollector _delegate; public SpoutOutputCollector(ISpoutOutputCollector delegate) {
_delegate = delegate;
} /**
* Emits a new tuple to the specified output stream with the given message ID.
* When Storm detects that this tuple has been fully processed, or has failed
* to be fully processed, the spout will receive an ack or fail callback respectively
* with the messageId as long as the messageId was not null. If the messageId was null,
* Storm will not track the tuple and no callback will be received. The emitted values must be
* immutable.
*
* @return the list of task ids that this tuple was sent to
*/
public List<Integer> emit(String streamId, List<Object> tuple, Object messageId) {
return _delegate.emit(streamId, tuple, messageId);
} /**
* Emits a new tuple to the default output stream with the given message ID.
* When Storm detects that this tuple has been fully processed, or has failed
* to be fully processed, the spout will receive an ack or fail callback respectively
* with the messageId as long as the messageId was not null. If the messageId was null,
* Storm will not track the tuple and no callback will be received. The emitted values must be
* immutable.
*
* @return the list of task ids that this tuple was sent to
*/
public List<Integer> emit(List<Object> tuple, Object messageId) {
return emit(Utils.DEFAULT_STREAM_ID, tuple, messageId);
} /**
* Emits a tuple to the default output stream with a null message id. Storm will
* not track this message so ack and fail will never be called for this tuple. The
* emitted values must be immutable.
*/
public List<Integer> emit(List<Object> tuple) {
return emit(tuple, null);
} /**
* Emits a tuple to the specified output stream with a null message id. Storm will
* not track this message so ack and fail will never be called for this tuple. The
* emitted values must be immutable.
*/
public List<Integer> emit(String streamId, List<Object> tuple) {
return emit(streamId, tuple, null);
} /**
* Emits a tuple to the specified task on the specified output stream. This output
* stream must have been declared as a direct stream, and the specified task must
* use a direct grouping on this stream to receive the message. The emitted values must be
* immutable.
*/
public void emitDirect(int taskId, String streamId, List<Object> tuple, Object messageId) {
_delegate.emitDirect(taskId, streamId, tuple, messageId);
} /**
* Emits a tuple to the specified task on the default output stream. This output
* stream must have been declared as a direct stream, and the specified task must
* use a direct grouping on this stream to receive the message. The emitted values must be
* immutable.
*/
public void emitDirect(int taskId, List<Object> tuple, Object messageId) {
emitDirect(taskId, Utils.DEFAULT_STREAM_ID, tuple, messageId);
} /**
* Emits a tuple to the specified task on the specified output stream. This output
* stream must have been declared as a direct stream, and the specified task must
* use a direct grouping on this stream to receive the message. The emitted values must be
* immutable.
*
* <p> Because no message id is specified, Storm will not track this message
* so ack and fail will never be called for this tuple.</p>
*/
public void emitDirect(int taskId, String streamId, List<Object> tuple) {
emitDirect(taskId, streamId, tuple, null);
} /**
* Emits a tuple to the specified task on the default output stream. This output
* stream must have been declared as a direct stream, and the specified task must
* use a direct grouping on this stream to receive the message. The emitted values must be
* immutable.
*
* <p> Because no message id is specified, Storm will not track this message
* so ack and fail will never be called for this tuple.</p>
*/
public void emitDirect(int taskId, List<Object> tuple) {
emitDirect(taskId, tuple, null);
} @Override
public void reportError(Throwable error) {
_delegate.reportError(error);
}
}
Storm-源码分析- spout (backtype.storm.spout)的更多相关文章
- Storm源码分析--Nimbus-data
nimbus-datastorm-core/backtype/storm/nimbus.clj (defn nimbus-data [conf inimbus] (let [forced-schedu ...
- JStorm与Storm源码分析(四)--均衡调度器,EvenScheduler
EvenScheduler同DefaultScheduler一样,同样实现了IScheduler接口, 由下面代码可以看出: (ns backtype.storm.scheduler.EvenSche ...
- JStorm与Storm源码分析(三)--Scheduler,调度器
Scheduler作为Storm的调度器,负责为Topology分配可用资源. Storm提供了IScheduler接口,用户可以通过实现该接口来自定义Scheduler. 其定义如下: public ...
- JStorm与Storm源码分析(二)--任务分配,assignment
mk-assignments主要功能就是产生Executor与节点+端口的对应关系,将Executor分配到某个节点的某个端口上,以及进行相应的调度处理.代码注释如下: ;;参数nimbus为nimb ...
- JStorm与Storm源码分析(一)--nimbus-data
Nimbus里定义了一些共享数据结构,比如nimbus-data. nimbus-data结构里定义了很多公用的数据,请看下面代码: (defn nimbus-data [conf inimbus] ...
- storm源码分析之任务分配--task assignment
在"storm源码分析之topology提交过程"一文最后,submitTopologyWithOpts函数调用了mk-assignments函数.该函数的主要功能就是进行topo ...
- storm源码分析之topology提交过程
storm集群上运行的是一个个topology,一个topology是spouts和bolts组成的图.当我们开发完topology程序后将其打成jar包,然后在shell中执行storm jar x ...
- JStorm与Storm源码分析(五)--SpoutOutputCollector与代理模式
本文主要是解析SpoutOutputCollector源码,顺便分析该类中所涉及的设计模式–代理模式. 首先介绍一下Spout输出收集器接口–ISpoutOutputCollector,该接口主要声明 ...
- Nimbus<三>Storm源码分析--Nimbus启动过程
Nimbus server, 首先从启动命令开始, 同样是使用storm命令"storm nimbus”来启动看下源码, 此处和上面client不同, jvmtype="-serv ...
- Storm-源码分析-acker (backtype.storm.daemon.acker)
backtype.storm.daemon.acker 设计的巧妙在于, 不用分别记录和track, stream过程中所有的tuple, 而只需要track root tuple, 而所有中间过程都 ...
随机推荐
- iPhone6 和 iPhone 6 plus的适配
苹果每出一款产品,都会引起广大IOS程序猿们的深深关注!是不是又该做适配了?是不是又该学习新东西了?种种的操心挂在心头. 以下我谈谈我对iPhone6 和 iPhone 6 plus适配问题的理解: ...
- 点滴积累【other】---存储过程修改表的所有字段(sql)
USE [QG_Mis24] GO /****** Object: StoredProcedure [dbo].[p_set] Script Date: 07/11/2013 17:05:38 *** ...
- 【转载】使用Exp和Expdp导出数据的性能对比与优化
转自:http://blog.itpub.net/117319/viewspace-1410931/ 序:这方面的文章虽然很多人写过,但是结合实际进行详细的对比分析的不多,这里,结合所在公司的行业,进 ...
- Unity3D_NGUI_性能优化实践_CPU卡顿
http://gad.qq.com/college/articledetail/7083468 博尔特以9.58秒创造了百米世界纪录,假设他是跑酷游戏的角色,卡顿一帧就足以把冠军拱手让人. Unity ...
- iOS7 SDK新特性
春风又绿加州岸.物是人非又一年.WWDC 2013 keynote落下帷幕,新的iOS开发旅程也由此开启.在iOS7界面重大变革的背后,开发人员们须要知道的又有哪些呢.同去年一样,我会先简单纵览地介绍 ...
- Java多线程之内置锁与显示锁
Java中具有通过Synchronized实现的内置锁,和ReentrantLock实现的显示锁,这两种锁各有各的好处,算是互有补充,今天就来做一个总结. Synchronized 内置锁获得锁和释放 ...
- Python unittest 参数化
准备工作: pip install nose_parameterized 典型场景:用户名.密码参数化 实例 1,新建一个ftl.py 文件 ,用来将存在于.txt .xlsx 文件中的参数化数据转 ...
- JDK配置之坑
JKD的配置我这里就不隆重介绍了,引用一篇百度经验,足够让大家去学习 JDK配置:https://jingyan.baidu.com/article/3c343ff70bc6ea0d377963df. ...
- Container类是Component的子类,它也是一个抽象类,它允许其他的组件(Component)加入其中
在AWT中,所有能在屏幕上显示的组件(component )对应的类,均是 抽象类 Component 的子类或子孙类. 这些类均可继承Component类的变量和方法. Container类是Com ...
- 《C++程序设计》朝花夕拾
(以后再也不用破Markdown写东西了,直到它有一个统一的标准,不然太乱了--) 函数签名 int f (int a, int b) ↑ ↑ ↑ ↑ 返回类型 函数名 形 式 参 数 其中,函数 ...