storm源码剖析（3）：topology启动过程

storm的topology启动过程是执行strom jar topology1.jar MAINCLASS ARG1 ARG2

鉴于前面已经分析了脚本的解析过程，现在重点分析topology1.jar的执行。

以storm-starter中的ExclamationTopology为例，来进行剖析：

public class ExclamationTopology {

  public static class ExclamationBolt extends BaseRichBolt {

    OutputCollector _collector;

    @Override

    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {

      _collector = collector;

    }

    @Override

    public void execute(Tuple tuple) {

      _collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));

      _collector.ack(tuple);

    }

    @Override

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

      declarer.declare(new Fields("word"));

    }

  }

  public static void main(String[] args) throws Exception {

    TopologyBuilder builder = new TopologyBuilder();

    builder.setSpout("word", new TestWordSpout(), 10);

    builder.setBolt("exclaim1", new ExclamationBolt(), 3).shuffleGrouping("word");

    builder.setBolt("exclaim2", new ExclamationBolt(), 2).shuffleGrouping("exclaim1");

    Config conf = new Config();

    conf.setDebug(true);

    if (args != null && args.length > 0) {

      conf.setNumWorkers(3);

      StormSubmitter.submitTopology(args[0], conf, builder.createTopology());

    }

    else {

      LocalCluster cluster = new LocalCluster();

      cluster.submitTopology("test", conf, builder.createTopology());

      Utils.sleep(10000);

      cluster.killTopology("test");

      cluster.shutdown();

    }

  }

}

可以看到一个topology的启动包括三个步骤：

（1）创建TopologyBuilder，设置输入源spout，设置输出源bolt

（2）创建Config，设置配置项

（3）提交topology

创建TopologyBuilder

TopologyBuilder对象创建很简单，先来看看setSpout():

public SpoutDeclarer setSpout(String id, IRichSpout spout, Number parallelism_hint) {

        validateUnusedId(id);

        initCommon(id, spout, parallelism_hint);

        _spouts.put(id, spout);

        return new SpoutGetter(id);

    }

首先，判断componentId是否使用过了，如果使用过，则直接剖错。

然后，初始化Commponent：创建ComponentCommon对象，并设置属性，然后在TopologyBuilder 的成员变量Map<String, IRichSpout> _commons中记录下common，其key为componentId（这里为“word”)。代码如下：

private void initCommon(String id, IComponent component, Number parallelism) {

        ComponentCommon common = new ComponentCommon();

        common.set_inputs(new HashMap<GlobalStreamId, Grouping>());

        if(parallelism!=null) common.set_parallelism_hint(parallelism.intValue());

        Map conf = component.getComponentConfiguration();

        if(conf!=null) common.set_json_conf(JSONValue.toJSONString(conf));

        _commons.put(id, common);

    }

其中ComponentCommon是使用thrift定义的，在storm.thrift中定义，代码如下：

struct ComponentCommon {

  1: required map<GlobalStreamId, Grouping> inputs;

  2: required map<string, StreamInfo> streams; //key is stream id

  3: optional i32 parallelism_hint; //how many threads across the cluster should be dedicated to this component

  // component specific configuration respects:

  // topology.debug: false

  // topology.max.task.parallelism: null // can replace isDistributed with this

  // topology.max.spout.pending: null

  // topology.kryo.register // this is the only additive one

  // component specific configuration

  4: optional string json_conf;

}

最后，在TopologyBuilder 的成员变量Map<String, IRichSpout> _spouts，记录下spout的记录。其中key也是componentId(这里为“word”)。

再来看看setBolt，与setSpout的处理一样，最终在TopologyBuilder 的成员变量Map<String, IRichSpout> _commons中记录下common，其key为componentId(这里为“exclaim1”)；在TopologyBuilder 的成员变量Map<String, IRichSpout> _bolts，记录下bolt的记录。其中key也是componentId(这里为“exclaim1”).

之后，.shuffleGrouping("word")这部分，是调用setBolt返回的，BoltDeclarer中的shuffleGrouping。

最终将会调用到grouping，其中streamId在这里没有指定，会使用"default"来替代。

public BoltDeclarer shuffleGrouping(StringcomponentId) {

return shuffleGrouping(componentId, Utils.DEFAULT_STREAM_ID);

}

public BoltDeclarer shuffleGrouping(StringcomponentId, String streamId) {

return grouping(componentId, streamId, Grouping.shuffle(newNullStruct()));

}

在这里grouping最后一个参数是生成了Grouping对象，并填充shuffle为NullStruct，其中Grouping是在storm.thrift定义的一个联合体,thrift会生成对应的java代码，内部定义了很多种grouping的方式。

private BoltDeclarer grouping(StringcomponentId, String streamId, Grouping grouping)

{

_commons.get(_boltId).put_to_inputs(new GlobalStreamId(componentId,streamId), grouping);

return this;

}

grouing函数是将之前记录在_commons中的，bolt的componentId对应的ComponentCommon的键值对，取出来设置ComponentCommon中的inputs的值。以第一个setBolt为例，就是取出"exclaim1"这个componentId对应的ComponentCommon，将里面的inputs设置为，这个输入是从哪里来的，也就是"word"这个componentId，streamId为"default"的这个spout流作为第一个bolt的输入源。

创建Config

Config比较简单，继承自Map，通过setXxx()为自身添加配置。

在这个例子中有两个set函数的调用。

conf.setDebug(true);就是在Map中插入一条记录("topology.debug" -> "true")，标记是打开debug模式的。

conf.setNumWorkers(3);同样在Map中插入一条记录("topology.workers" -> 3)，标记worker数为3个。

提交Topology--这才是重点

StormSubmitter.submitTopology(args[0], conf, builder.createTopology());

（1）createTopology
builder.createTopology()利用之间构建的TopologyBuilder对象生成一个StormToplogy对象。

public StormTopology createTopology() {

        Map<String, Bolt> boltSpecs = new HashMap<String, Bolt>();

        Map<String, SpoutSpec> spoutSpecs = new HashMap<String, SpoutSpec>();

        for(String boltId: _bolts.keySet()) {

            IRichBolt bolt = _bolts.get(boltId);

            ComponentCommon common = getComponentCommon(boltId, bolt);

            boltSpecs.put(boltId, new Bolt(ComponentObject.serialized_java(Utils.serialize(bolt)), common));

        }

        for(String spoutId: _spouts.keySet()) {

            IRichSpout spout = _spouts.get(spoutId);

            ComponentCommon common = getComponentCommon(spoutId, spout);

            spoutSpecs.put(spoutId, new SpoutSpec(ComponentObject.serialized_java(Utils.serialize(spout)), common));

        }

        return new StormTopology(spoutSpecs,

                                 boltSpecs,

                                 new HashMap<String, StateSpoutSpec>());

    }

需要说明的是，StormTopology也是storm.thrift中定义的一个struct结构，定义如下：

struct StormTopology {

  //ids must be unique across maps

  // #workers to use is in conf

  1: required map<string, SpoutSpec> spouts;

  2: required map<string, Bolt> bolts;

  3: required map<string, StateSpoutSpec> state_spouts;

}

(2)submitTopology
提交到集群会调用StormSubmitter.submitTopology()

public static void submitTopology(String name, Map stormConf, StormTopology topology) throws AlreadyAliveException, InvalidTopologyException {

        if(!Utils.isValidConf(stormConf)) {

            throw new IllegalArgumentException("Storm conf is not valid. Must be json-serializable");

        }

        stormConf = new HashMap(stormConf);

        stormConf.putAll(Utils.readCommandLineOpts());

        Map conf = Utils.readStormConfig();

        conf.putAll(stormConf);

        try {

            String serConf = JSONValue.toJSONString(stormConf);

            if(localNimbus!=null) {

                LOG.info("Submitting topology " + name + " in local mode");

                localNimbus.submitTopology(name, null, serConf, topology);

            } else {

                submitJar(conf);

                NimbusClient client = NimbusClient.getConfiguredClient(conf);

                try {

                    LOG.info("Submitting topology " +  name + " in distributed mode with conf " + serConf);

                    client.getClient().submitTopology(name, submittedJar, serConf, topology);

                } finally {

                    client.close();

                }

            }

            LOG.info("Finished submitting topology: " +  name);

        } catch(TException e) {

            throw new RuntimeException(e);

        }

    }

具体流程为：

1）检验传进来的配置合法性，并读取命令行配置storm.options项的值、读取默认配置default.yaml、读取storm.yaml，将这些所有的配置项都添加到conf

中。

2）调用submitJar(conf)，上传jar到master。

a. 具体是通过获取命令行参数“storm.jar”来得到要上传的jar；

b. 获取Config.NIMBUS_HOST和Config.NIMBUS_THRIFT_PORT值，创建NimbusClient。在内部是封装了访问Nimbus这个rpc server(基于thrift)的rpc client，在NimbusClient构造时，就创建了rpcclient并建立与rpc server的连接；

c. 调用client.getClient().beginFileUpload()通知要上传文件，Nimbus会返回一个上传的路径，之后分段读取jar文件，调用uploadChunk上传到nimbus所告知的那个路径，jar文件数据都上传完毕调用finishFileUpload告知nimbus对那个路径的文件已上传完毕，最后关闭rpc连接。其中beginFileUpload，uploadChunk，finishFileUpload都是storm.thrift定义的service Nimbus中的方法，其中Nimbus.Iface是在 Nimbus.clj被实现了。

d. 成功上传jar文件后，会再次创建的一个rpc client调用Nimbus上的submitTopology方法，这个方法也是在storm.thrft中service Nimbus。调用这个Nimbus上的这个方法可以理解为通知Nimbus去运行这个topology。通知的时候，会带上name，这个name就是topology的名字。

private static void submitJar(Map conf) {

        if(submittedJar==null) {

            LOG.info("Jar not uploaded to master yet. Submitting jar...");

            String localJar = System.getProperty("storm.jar");

            submittedJar = submitJar(conf, localJar);

        } else {

            LOG.info("Jar already uploaded to master. Not submitting jar.");

        }

    }

public static String submitJar(Map conf, String localJar) {

        if(localJar==null) {

            throw new RuntimeException("Must submit topologies using the 'storm' client script so that StormSubmitter knows which jar to upload.");

        }

        NimbusClient client = NimbusClient.getConfiguredClient(conf);

        try {

            String uploadLocation = client.getClient().beginFileUpload();

            LOG.info("Uploading topology jar " + localJar + " to assigned location: " + uploadLocation);

            BufferFileInputStream is = new BufferFileInputStream(localJar);

            while(true) {

                byte[] toSubmit = is.read();

                if(toSubmit.length==0) break;

                client.getClient().uploadChunk(uploadLocation, ByteBuffer.wrap(toSubmit));

            }

            client.getClient().finishFileUpload(uploadLocation);

            LOG.info("Successfully uploaded topology jar to assigned location: " + uploadLocation);

            return uploadLocation;

        } catch(Exception e) {

            throw new RuntimeException(e);

        } finally {

            client.close();

        }

    }

public static NimbusClient getConfiguredClient(Map conf) {

        String nimbusHost = (String) conf.get(Config.NIMBUS_HOST);

        int nimbusPort = Utils.getInt(conf.get(Config.NIMBUS_THRIFT_PORT));

        return new NimbusClient(nimbusHost, nimbusPort);

    }

public NimbusClient(String host, int port) {

        try {

            if(host==null) {

                throw new IllegalArgumentException("Nimbus host is not set");

            }

            conn = new TFramedTransport(new TSocket(host, port));//创建rpc连接

            client = new Nimbus.Client(new TBinaryProtocol(conn));//创建rpc客户端

            conn.open(); //打开连接

        } catch(TException e) {

            throw new RuntimeException(e);

        }

    }

到此，一个编写有topology任务的jar文件就上传提交到nimbus，接下来的工作就是由nimbus将topology分发给supervisors去执行。请关注后续的剖析。

storm源码剖析（3）：topology启动过程的更多相关文章

storm源码分析之topology提交过程
storm集群上运行的是一个个topology,一个topology是spouts和bolts组成的图.当我们开发完topology程序后将其打成jar包,然后在shell中执行storm jar x ...
老李推荐：第8章7节《MonkeyRunner源码剖析》MonkeyRunner启动运行过程-小结
老李推荐:第8章7节<MonkeyRunner源码剖析>MonkeyRunner启动运行过程-小结 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自动化测试,性 ...
老李推荐：第8章5节《MonkeyRunner源码剖析》MonkeyRunner启动运行过程-运行测试脚本
老李推荐:第8章5节<MonkeyRunner源码剖析>MonkeyRunner启动运行过程-运行测试脚本 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自动化 ...
老李推荐：第8章1节《MonkeyRunner源码剖析》MonkeyRunner启动运行过程-运行环境初始化
老李推荐:第8章1节<MonkeyRunner源码剖析>MonkeyRunner启动运行过程-运行环境初始化首先大家应该清楚的一点是,MonkeyRunner的运行是牵涉到主机端和目 ...
Netty源码分析之客户端启动过程
一.先来看一下客户端示例代码. public class NettyClientTest { public void connect(int port, String host) throws Exc ...
tomcat8 源码分析 | 组件及启动过程
tomcat 8 源码分析 ,本文主要讲解tomcat拥有哪些组件,容器,又是如何启动的推荐访问我的个人网站,排版更好看呦: https://chenmingyu.top/tomcat-source ...
Fabric1.4源码解析: 链码容器启动过程
想写点东西记录一下最近看的一些Fabric源码,本文使用的是fabric1.4的版本,所以对于其他版本的fabric,内容可能会有所不同. 本文想针对Fabric中链码容器的启动过程进行源码的解析.这 ...
Spring源码系列——容器的启动过程(一)
一. 前言 Spring家族特别庞大,对于开发人员而言,要想全面征服Spring家族,得花费不少的力气.俗话说,打蛇打七寸,那么Spring家族的"七寸"是什么呢?我心目中的答案一 ...
Netty源码解析 -- 服务端启动过程
本文通过阅读Netty源码,解析Netty服务端启动过程. 源码分析基于Netty 4.1 Netty是一个高性能的网络通信框架,支持NIO,OIO等多种IO模式.通常,我们都是使用NIO模式,该系列 ...

随机推荐

JSConsole调试
http://jsconsole.com/ https://github.com/remy/jsconsole
sql数据库log自动增长被取消
原因分析:数据库可分配空间为0 解决方法:增加数据库初始大小
curl库pycurl实例及参数详解
pycurl是功能强大的python的url库,是用c语言写的,速度很快,比urllib和httplib都快. 今天我们来看一下pycurl的用法及参数详解常用方法: pycurl.Curl() # ...
\\s+ split替换
出自: http://www.tuicool.com/articles/vy2ymm 详解 "\\s+" 正则表达式中\s匹配任何空白字符,包括空格.制表符.换页符等等, 等价于[ ...
扩展MongoDB C# Driver的QueryBuilder
扩展MongoDB C# Driver的QueryBuilder 因为不想直接hardcode "ClassA.MemberA.MemberB" 这种字符串 .写了下面几个类,用于 ...
Source-php-request-2
php比較坑的地方就是实现相同的目的,能够使用超级多种手段.比方(file_get_contents和fopen以及如今提到的curl以及fsockopen当然还有socket)这对于一个经验少的程序 ...
启动Eclipse时，启不起来JVM terminated. Exit code=-1
启动Eclipse时,启不起来JVM terminated. Exit code=-1 出现错误了,不知道什么原因原本好好的Eclipse,今天早上出问题了,启动不起来还抛出JVM terminate ...
03 redis之string类型命令解析
Redis字符串类型的操作 set key value [ex 秒数] / [px 毫秒数] [nx] /[xx] 如: set a 1 ex 10 , 10秒有效 Set a 1 px 9000 , ...
九度OJ 1172：哈夫曼树（贪心）
时间限制:1 秒内存限制:32 兆特殊判题:否提交:6701 解决:2954 题目描述: 哈夫曼树,第一行输入一个数n,表示叶结点的个数.需要用这些叶结点生成哈夫曼树,根据哈夫曼树的概念,这些结 ...
九度OJ 1049：字符串去特定字符（基础题）
时间限制:1 秒内存限制:32 兆特殊判题:否提交:8499 解决:3860 题目描述: 输入字符串s和字符c,要求去掉s中所有的c字符,并输出结果. 输入: 测试数据有多组,每组输入字符串s和 ...

storm源码剖析（3）：topology启动过程

storm源码剖析（3）：topology启动过程的更多相关文章

随机推荐

热门专题