Storm-源码分析- Component ,Executor ,Task之间关系

Component包含Executor(threads)的个数
在StormBase中的num-executors, 这对应于你写topology代码时, 为每个component指定的并发数(通过setBolt和setSpout)

Component和Task的对应关系, (storm-task-info)
默认你可以不指定task数, 那么task和executor为1:1关系
当然也可以通过ComponentConfigurationDeclarer#setNumTasks()去设置TOPOLOGY_TASKS
这个函数, 首先读出所有components
对每个component, 读出ComponentComm中的json_conf, 然后从里面读出上面设置的TOPOLOGY_TASKS
最后用递增序列产生taskid, 并最终生成component和task的对应关系
如果不设置, task数等于executor数, 后面分配就很容易, 否则就涉及task分配问题

(defn storm-task-info

  "Returns map from task -> component id"

  [^StormTopology user-topology storm-conf]

  (->> (system-topology! storm-conf user-topology)

       all-components

       (map-val (comp #(get % TOPOLOGY-TASKS) component-conf))

       (sort-by first)

       (mapcat (fn [[c num-tasks]] (repeat num-tasks c)))

       (map (fn [id comp] [id comp]) (iterate (comp int inc) (int 1)))

       (into {})

       ))

首先产生system-topology!, 因为system-topology!会增加系统components, acker, systemBolt, metricsBlot, 这些也都是topology中不可缺少的部分, 所以单纯使用用户定义的topology是不够的

然后取出topology里面所有component

(defn all-components [^StormTopology topology]

  (apply merge {}

         (for [f thrift/STORM-TOPOLOGY-FIELDS]

           (.getFieldValue topology f)

           )))

使用thrift/STORM-TOPOLOGY-FIELDS从StormTopology的metadata里面读出每个fieldid, 并取出value进行merge

所以结果就是下面3个map, merge在一起的集合

struct StormTopology {

  //ids must be unique across maps

  // #workers to use is in conf

  1: required map<string, SpoutSpec> spouts;

  2: required map<string, Bolt> bolts;

  3: required map<string, StateSpoutSpec> state_spouts;

}

使用map-value对map中的component进行如下操作

取出component里面的ComponentComm对象(.getcommon), 并读出json_conf, 最终读出conf中TOPOLOGY-TASKS

(defn component-conf [component]

  (->> component

      .get_common

      .get_json_conf

      from-json))

struct ComponentCommon {

  1: required map<GlobalStreamId, Grouping> inputs;

  2: required map<string, StreamInfo> streams; //key is stream id

  3: optional i32 parallelism_hint; //how many threads across the cluster should be dedicated to this component

  // component specific configuration

  4: optional string json_conf;

}

输出{component-string:tasknum}, 按component-string排序, 再进行mapcat

{c1 3, c2 2, c3 1} –> (c1,c1,c1,c2,c2,c3)

再加上递增编号, into到map, {1 c1, 2 c1, 3 c1, 4 c2, 5 c2, 6 c3}

Topology中, Task和Executor的分配关系, (compute-executors)

上面已经产生, component->executors 和 component->task, 现在根据component对应的task和executor个数进行task分配(到executor)

默认是1:1分配, 但如果设置了task数,

比如对于c1, 2个executor, 3个tasks [1 2 3], 分配结果就是['(1 2) ‘(3)]

最终to-executor-id, 列出每个executor中task id的范围([(first task-ids) (last task-ids)])

(defn- compute-executors [nimbus storm-id]

  (let [conf (:conf nimbus)

        storm-base (.storm-base (:storm-cluster-state nimbus) storm-id nil)

        component->executors (:component->executors storm-base) ;从storm-base中获取每个component配置的(executor)线程数

        storm-conf (read-storm-conf conf storm-id)

        topology (read-storm-topology conf storm-id)

        task->component (storm-task-info topology storm-conf)]

    (->> (storm-task-info topology storm-conf)

         reverse-map ;{“c1” [1,2,3], “c2” [4,5], “c3” 6}

         (map-val sort)

         (join-maps component->executors) ; {"c1" ‘(2 [1 2 3]), "c2" ‘(2 [4 5]), "c3" ‘(1 6)}

         (map-val (partial apply partition-fixed)) ; {"c1" ['(1 2) '(3)], "c2" ['(4) '(5)], "c3" ['(6)]}

         (mapcat second) ;((1 2) (3) (4) (5) (6))

         (map to-executor-id) ;([1 2] [3 3] [4 4] [5 5] [6 6])

         )))

partition-fixed, 将aseq分成max-num-chunks份

思路,

7整除3, 2余1

所以, 分成3份, 每份2个, 还余一个

把这个放到第一份里面,

所以, 有1份的2+1个, 有(3-1)份的2个

这里使用integer-divided(7 3), ([3 1] [2 2]) , 刚开始比较难理解, 其实函数名起的不好, 这里不光除, 已经做了划分

返回的结果的意思是, 1份3个, 2份2个

接着就是使用split-at, loop划分

(defn partition-fixed

“(partition-fixed 3 '( 1 2 3 4 5 6 7)) [(1 2 3) (4 5) (6 7)]”

  [max-num-chunks aseq]

  (if (zero? max-num-chunks)

    []

    (let [chunks (->> (integer-divided (count aseq) max-num-chunks)

                      (#(dissoc % 0))

                      (sort-by (comp - first))

                      (mapcat (fn [[size amt]] (repeat amt size)))

                      )]

      (loop [result []

             [chunk & rest-chunks] chunks

             data aseq]

        (if (nil? chunk)

          result

          (let [[c rest-data] (split-at chunk data)]

            (recur (conj result c)

                   rest-chunks

                   rest-data)))))))

Topology中, Executor和component的关系, (compute-executor->component ), 根据(executor:task)关系和(task:component)关系join

(defn- compute-executor->component [nimbus storm-id]

  (let [conf (:conf nimbus)

        executors (compute-executors nimbus storm-id)

        topology (read-storm-topology conf storm-id)

        storm-conf (read-storm-conf conf storm-id)

        task->component (storm-task-info topology storm-conf)

        executor->component (into {} (for [executor executors

                                           :let [start-task (first executor)

                                                 component (task->component start-task)]]

                                       {executor component}))]

        executor->component)) ;{[1 2] “c1”, [3 3] “c1”, [4 4] “c2”, [5 5] “c2”, [6 6] “c3”}

最终目的就是获得executor->component关系, 用于后面的assignment, 其中每个executor包含task范围[starttask, endtask]

Storm-源码分析- Component ,Executor ,Task之间关系的更多相关文章

storm源码分析之任务分配--task assignment
在"storm源码分析之topology提交过程"一文最后,submitTopologyWithOpts函数调用了mk-assignments函数.该函数的主要功能就是进行topo ...
Storm源码分析--Nimbus-data
nimbus-datastorm-core/backtype/storm/nimbus.clj (defn nimbus-data [conf inimbus] (let [forced-schedu ...
JStorm与Storm源码分析（二）--任务分配，assignment
mk-assignments主要功能就是产生Executor与节点+端口的对应关系,将Executor分配到某个节点的某个端口上,以及进行相应的调度处理.代码注释如下: ;;参数nimbus为nimb ...
JStorm与Storm源码分析（四）--均衡调度器，EvenScheduler
EvenScheduler同DefaultScheduler一样,同样实现了IScheduler接口, 由下面代码可以看出: (ns backtype.storm.scheduler.EvenSche ...
JStorm与Storm源码分析（三）--Scheduler，调度器
Scheduler作为Storm的调度器,负责为Topology分配可用资源. Storm提供了IScheduler接口,用户可以通过实现该接口来自定义Scheduler. 其定义如下: public ...
Spark源码分析之八：Task运行（二）
在<Spark源码分析之七:Task运行(一)>一文中,我们详细叙述了Task运行的整体流程,最终Task被传输到Executor上,启动一个对应的TaskRunner线程,并且在线程池中 ...
Spark源码分析之七：Task运行（一）
在Task调度相关的两篇文章<Spark源码分析之五:Task调度(一)>与<Spark源码分析之六:Task调度(二)>中,我们大致了解了Task调度相关的主要逻辑,并且在T ...
Spark源码分析之六：Task调度（二）
话说在<Spark源码分析之五:Task调度(一)>一文中,我们对Task调度分析到了DriverEndpoint的makeOffers()方法.这个方法针对接收到的ReviveOffer ...
JStorm与Storm源码分析（一）--nimbus-data
Nimbus里定义了一些共享数据结构,比如nimbus-data. nimbus-data结构里定义了很多公用的数据,请看下面代码: (defn nimbus-data [conf inimbus] ...

随机推荐

dependent-name ‘xxx::yyy’ is parsed as a non-type, but instantiation yields a type
简言之,就是说你该用typename的地方没用typename,如以下代码 template<class Cont> void frontInsertion(Cont& ci) { ...
Git出现error: Your local changes to the following files would be overwritten by merge: ... Please, commit your changes or stash them before you can merge.的问题解决（Git代码冲突）
在使用git pull拉取服务器最新版本时,如果出现error: Your local changes to the following files would be overwritten by m ...
今天遇到个PHP不知原因的报内部错误
今天遇到个PHP不知原因的报内部错误纠结了很久想尽了办法,1.apache日志 2.错误级别 ,还差点就把自己写的那个破烂不堪的日志系统加上去了纠结了很久还是无果,在最终,最终发现了原来是类命名 ...
IE9 BUG overflow :auto 底部空白解决方案
今天去升级了到IE9,运行项目的时候发现,我的div显示滚动条时候,用js动态加载进去的内容在光标移动的时候,底部自动被撑大留着空白, IE8 Chrome这些以前都试过没发现这个问题研究了好久 ...
CRC16
http://www.stmcu.org/chudonganjin/blog/12-08/230184_515e6.html 1.循环校验码(CRC码): 是数据通信领域中最常用的一种差错校验码,其特 ...
HeadFirst Jsp 09 (JSTL)
JSTL (jsp standard tag library) 标准标记库 JSTL 安装, 注意你的每一个项目都需要一个 JSTL副本, 并把它放在WEB-INF/lib 目录下, 在 Tomcat ...
关于为空必填js判断
为了减少一不必要的if逻辑判断,自已写了一个方法 $(function () { $("#btnAdd").click(function () { var strLinValu = ...
008杰信-创建购销合同Excel报表系列-1-建四张表
本博客的内容来自于传智播客: 我们现在开始要做表格了,根据公司要求的表格的形式,来设计数据库.规划针对这个表格要设计几张表,每张表需要哪些字段. 根据公司原有的表格,设计数据库: 原有的表格
CListBox自动换行显示
需要在ListBox控件中显示一些信息.为方便查看,不使用水平滚动条.当要输出的字符串占用的宽度超过ListBox的宽度时,截断字符串,剩余的在下一行显示. 1. 计算ListBox所占的宽度,用Ge ...
myForm.js
根据控件名,重现一些特殊的表单项,生成html var can_submit = true; function myForm($form_id, $id_value, province, city, ...

Storm-源码分析- Component ,Executor ,Task之间关系

Storm-源码分析- Component ,Executor ,Task之间关系的更多相关文章

随机推荐

热门专题