Storm系列(三)Topology提交过程

提交示例代码：

1	public static void main(String[] args) throws Exception {
2	TopologyBuilder builder = new TopologyBuilder();
3	builder.setSpout("random", new RandomWordSpout(), 2);
4	builder.setBolt("transfer", new TransferBolt(), 4).shuffleGrouping("random");
5	builder.setBolt("writer", new WriterBolt(), 4).fieldsGrouping("transfer", new Fields("word"));
6	Config conf = new Config();
7	conf.setNumWorkers(4);// 设置启动4个Worker
8	conf.setNumAckers(1); // 设置一个ack线程
9	conf.setDebug(true); // 设置打印所有发送的消息及系统消息
10	StormSubmitter.submitTopology("test", conf, builder.createTopology());
11	}

1、构建 TopologyBuilder 对象 builder,主要用于对各个组件(bolt、spout)进行配置,TopologyBuilder主要属性字段定义如下：

1	public class TopologyBuilder {
2
3	// 所提交Topolog中所有的bolt将放入到_bolts中
4	private Map<String, IRichBolt> _bolts = new HashMap<String, IRichBolt>();
5
6	// 所提交Topolog中所有的spout将放入到_spouts中
7	private Map<String, IRichSpout> _spouts = new HashMap<String, IRichSpout>();
8
9	// 所提交Topolog中所有的spout和bolt都将放入_commons中
10	private Map<String, ComponentCommon> _commons = new HashMap<String, ComponentCommon>();
11
12	....................................
13	}
14

2、以上提交代码中第三行，配置了一个id值为random，IRichSpout对象为RandomWordSpout,而并行度为2(两个线程里面跑两个任务)的spout, setSpout函数实现源码如下：

1	public SpoutDeclarer setSpout(String id, IRichSpout spout, Number parallelism_hint) {
2
3	validateUnusedId(id);
4
5	initCommon(id, spout, parallelism_hint);
6
7	_spouts.put(id, spout);
8
9	return new SpoutGetter(id);
10
11	}
12

validateUnusedId：检测输入的id是不是唯一，若已经存在将抛出异常；

initCommon：构建ComponentCommon对象并进行相应的初始化,最后放入到_commons（以上TopologyBuilder中定义的Map）；

initCommon函数实现源码：

1	private void initCommon(String id, IComponent component, Number parallelism) {
2
3	ComponentCommon common = new ComponentCommon();
4
5	// 设置消息流的来源及分组方式
6
7	common.set_inputs(new HashMap<GlobalStreamId, Grouping>());
8
9	if(parallelism!=null)
10
11	// 设置并行度
12	common.set_parallelism_hint(parallelism.intValue());
13
14	Map conf = component.getComponentConfiguration();
15
16	if(conf!=null)
17
18	// 设置组件的配置参数
19	common.set_json_conf(JSONValue.toJSONString(conf));
20
21	_commons.put(id, common);
22	}

在ComponentCommon中主要对以下四个属性字段进行设置：

GlobalStreamId：确定消息来源，其中componentId表示所属组件，streamId为消息流的标识符;

Grouping：确定消息分组方式；

private Map<GlobalStreamId,Grouping> inputs;

StreamInfo表示输出的字段列表及是否为直接流

private Map<String,StreamInfo> streams;

private int parallelism_hint; // 设置并行度

private String json_conf; // 其它配置参数设置(必须为JSON格式)

3、SpoutGetter实现源码：

1	protected class SpoutGetter extends ConfigGetter<SpoutDeclarer> implements SpoutDeclarer {
2
3	public SpoutGetter(String id) {
4	super(id);
5	}
6	}

ConfigGetter、SpoutGetter的实现都是在TopologyBuilder中, ConfigGetter作用：设置程序中的配置项，覆盖默认的配置项，且配置项的格式为为JSON(本质上是改变对应ComponentCommon对象中json_conf的值)；

4、提交示例代码中的第四行定义了一个id为transfer，IRichSpout对象为TransferBolt，并行度为4的bolt

setBolt实现源码:

1	public BoltDeclarer setBolt(String id, IRichBolt bolt, Number parallelism_hint) {
2
3	validateUnusedId(id);
4
5	initCommon(id, bolt, parallelism_hint);
6
7	_bolts.put(id, bolt);
8
9	return new BoltGetter(id);
10
11	}
12

设置Bolt的函数与设置Spout函数的实现唯一的区别在返回结果；

BoltGetter实现部分源码：

1	protected class BoltGetter extends ConfigGetter<BoltDeclarer> implements BoltDeclarer {
2
3	private String _boltId;
4
5	public BoltGetter(String boltId) {
6
7	super(boltId);
8
9	_boltId = boltId;
10
11	}
12
13	public BoltDeclarer shuffleGrouping(String componentId) {
14
15	return shuffleGrouping(componentId, Utils.DEFAULT_STREAM_ID);
16
17	}
18
19	public BoltDeclarer fieldsGrouping(String componentId, Fields fields) {
20
21	return fieldsGrouping(componentId, Utils.DEFAULT_STREAM_ID, fields);
22
23	}
24
25	public BoltDeclarer fieldsGrouping(String componentId, String streamId, Fields fields) {
26
27	return grouping(componentId, streamId, Grouping.fields(fields.toList()));
28
29	}
30
31	public BoltDeclarer shuffleGrouping(String componentId, String streamId) {
32
33	return grouping(componentId, streamId, Grouping.shuffle(new NullStruct()));
34
35	}
36
37	private BoltDeclarer grouping(String componentId, String streamId, Grouping grouping) {
38
39	_commons.get(_boltId).put_to_inputs(new GlobalStreamId(componentId, streamId), grouping);
40
41	return this;
42
43	}
44
45	.........................................
46
47	}
48

BoltGetter继承至ConfigGetter并实现了BoltDeclarer接口，并重载了BoltDeclarer(InputDeclarer)中各种分组方式(如：fieldsGrouping、shuffleGrouping)，分组方式的实现本质上是在_commons中通过对用的boltId找到对应的ComponentCommon对象，对inputs属性进行设置;

5、通过以上几步完成了bolt与spout的配置（对应提交示例代码中的2~5行），6~9行是对运行环境的配置，10行用于向集群提交执行任务，builder.createTopology用于构建StormTopology对象,createTopology实现源码如下：

1	public StormTopology createTopology() {
2
3	Map<String, Bolt> boltSpecs = new HashMap<String, Bolt>();
4
5	Map<String, SpoutSpec> spoutSpecs = new HashMap<String, SpoutSpec>();
6
7	for(String boltId: _bolts.keySet()) {
8
9	IRichBolt bolt = _bolts.get(boltId);
10
11	ComponentCommon common = getComponentCommon(boltId, bolt);
12
13	boltSpecs.put(boltId, new Bolt(ComponentObject.serialized_java(Utils.serialize(bolt)), common));
14
15	}
16
17	for(String spoutId: _spouts.keySet()) {
18
19	IRichSpout spout = _spouts.get(spoutId);
20
21	ComponentCommon common = getComponentCommon(spoutId, spout);
22
23	spoutSpecs.put(spoutId, new SpoutSpec(ComponentObject.serialized_java(Utils.serialize(spout)), common));
24
25	}
26
27	return new StormTopology(spoutSpecs,boltSpecs,new HashMap<String, StateSpoutSpec>());
28
29	}
30
31

以上源码实现中主要做了两件事:

通过boltId从_bolts中获取到对应的bolt对象，再通过getComponentCommon方法设置对应ComponentCommon对象的streams（输出的字段列表及是否为直接流）属性值，最后将bolt和common一起放入到boltSpecs集合中。
通过spoutId从_spouts中获取到对应的spout对象，再通过getComponentCommon方法设置对应ComponentCommon对象的streams（输出的字段列表及是否为直接流）属性值，最后将spout和common一起放入到boltSpecs集合中。
通过以上两步使所设置的所有组件都封装到StormTopology对象中，最后提交的到集群中运行。

Storm系列(三)Topology提交过程的更多相关文章

Storm系列(四)Topology提交校验过程
功能:提交一个新的Topology,并为Topology创建storm-id(topology-id),校验其结构,设置必要的元数据,最后为Topology分配任务. 实现源码: 1 ); Conf ...
Storm系列三： Storm消息可靠性保障
Storm系列三: Storm消息可靠性保障在上一篇 Storm系列二: Storm拓扑设计中我们已经设计了一个稍微复杂一点的拓扑. 而本篇就是在上一篇的基础上再做出一定的调整. 在这里先大概提一 ...
storm源码分析之topology提交过程
storm集群上运行的是一个个topology,一个topology是spouts和bolts组成的图.当我们开发完topology程序后将其打成jar包,然后在shell中执行storm jar x ...
Storm系列（三）：创建Maven项目打包提交wordcount到Storm集群
在上一篇博客中,我们通过Storm.Net.Adapter创建了一个使用Csharp编写的Storm Topology - wordcount.本文将介绍如何编写Java端的程序以及如何发布到测试的S ...
Storm 系列（三）Storm 集群部署和配置
Storm 系列(二)Storm 集群部署和配置本章中主要介绍了 Storm 的部署过程以及相关的配置信息.通过本章内容,帮助读者从零开始搭建一个 Storm 集群. 一.Storm 的依赖组件 1 ...
Storm 系列（六）—— Storm 项目三种打包方式对比分析
一.简介在将 Storm Topology 提交到服务器集群运行时,需要先将项目进行打包.本文主要对比分析各种打包方式,并将打包过程中需要注意的事项进行说明.主要打包方式有以下三种: 第一种:不加任 ...
Storm Topology 提交总结---Kettle On Storm 实现
一,目的在学习的过程中,需要用到 PDI---一个开源的ETL软件.主要是用它来设计一些转换流程来处理数据.但是,在PDI中设计好的 transformation 是在本地的执行引擎中执行的,(参考 ...
Storm概念学习系列之Topology拓扑
不多说,直接上干货! Hadoop 上运行的是 MapReduce 作业,而在 Storm 上运行的是拓扑 Topology,这两者之间是非常不同的.一个关键的区别是:一个MapReduce 作业 ...
storm源码剖析（3）：topology启动过程
storm的topology启动过程是执行strom jar topology1.jar MAINCLASS ARG1 ARG2 鉴于前面已经分析了脚本的解析过程,现在重点分析topology1.ja ...

随机推荐

expdp ORA-39213
[oracle@BI2 dir_dp]$ impdp ruijie_kettle/ruijie_kettle schemas=ruijie_kettle directory=dir_dp dumpfi ...
extern "C"的作用
一.概述在C语言的头文件中,经常可以看到如下的代码,那这个是什么作用呢? #ifdef __cplusplus extern "C" { #endif /*...*/ #ifde ...
Windows常用CMD命令
远程桌面:mstsc 记事本:notepad 写字板:write 计算器:calc IIS重启:iisreset 60秒倒计时关机命令:tsshutdn 15秒关机:rononc ...
ubuntu terminal 介绍及相关命令
ubuntu的terminal 1.调出方法 windows键+T 2.终端显示内容 3. 查看当前所在目录的绝对路径--pwd命令 eg1: eg2: linux严格区分大小写 4. 更改/进入目录 ...
hadoop 存储空间满了
-- ::, WARN mapred.LocalJobRunner - job_local_0001 org.apache.hadoop.util.DiskChecker$DiskErrorExcep ...
JSON漫谈
JSON: JavaScript Object Notation(JavaScript 对象表示法),JSON 是存储和交换文本信息的语法.类似 XML.JSON 比 XML 更小.更快,更易解析. ...
backbone案例
http://www.kuqin.com/webpagedesign/20120807/324101.html http://udonmai.com/code/todos-backbone%E6%A1 ...
C++引用的实质
转自探索c++的底层机制在看这篇文章之前,请你先要明白一点:那就是c++为我们所提供的各种存取控制仅仅是在编译阶段给我们的限制,也就是说是编译器确保了你在完成任务之前的正确行为,如果你的行为不正确, ...
linux hosts一个诡异问题
最近部署环境时遇到一个古怪问题. 背景环境: tomcat服务器 : 192.168.13.78 简称t 依赖服务器 : 192.168.12.159 简称s 二者关系 :t服务器的tomcat应 ...
grab jpegs from v4l2 devices
某些平台下opencv不能直接直接支持摄像头获取图片或视频,这是需要使用v4l2(video for linux 2) 测试v4l2是否可以读到摄像头的内容,最简单的办法就是读取一张截图. v4l2g ...

Storm系列(三)Topology提交过程

Storm系列(三)Topology提交过程的更多相关文章

随机推荐

热门专题