spark2.1源码分析2：从SparkPi分析一个job的执行

从SparkPi的一个行动操作入手，选择Run–Debug SparkPi进入调试：

F8：Step Over

F7：Step Into

右键Run to Cursor

Ctrl+B 查看定义

导航–Back和Forward

SparkPi：

val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>

      val x = random * 2 - 1

      val y = random * 2 - 1

      if (x*x + y*y < 1) 1 else 0

    }.~~reduce(_ + _)~~



RDD：

/**

   * Reduces the elements of this RDD using the specified commutative and

   * associative binary operator.

   */

  def reduce(f: (T, T) => T): T = withScope {

    val cleanF = sc.clean(f)

//    对单个Partition执行clean后的函数

    val reducePartition: Iterator[T] => Option[T] = iter => {

      if (iter.hasNext) {

        Some(iter.reduceLeft(cleanF))

      } else {

        None

      }

    }

    var jobResult: Option[T] = None

//    合并所有Partition结果

    val mergeResult = (index: Int, taskResult: Option[T]) => {

      if (taskResult.isDefined) {

        jobResult = jobResult match {

          case Some(value) => Some(f(value, taskResult.get))

          case None => taskResult

        }

      }

    }

    ~~sc.runJob(this, reducePartition, mergeResult)~~

    // Get the final result out of our Option, or throw an exception if the RDD was empty

    jobResult.getOrElse(throw new UnsupportedOperationException("empty collection"))

  }

SparkContext：

  /**

   * Run a job on all partitions in an RDD and pass the results to a handler function.

   */

  def runJob[T, U: ClassTag](

      rdd: RDD[T],

      processPartition: Iterator[T] => U,

      resultHandler: (Int, U) => Unit)

  {

    //进一步封装对每个Partition处理处理的函数

    val processFunc = (context: TaskContext, iter: Iterator[T]) => processPartition(iter)

    ~~runJob[T, U](rdd, processFunc, 0 until rdd.partitions.length, resultHandler)~~

  }

SparkContext：

 /**

   * Run a function on a given set of partitions in an RDD and pass the results to the given

   * handler function. This is the main entry point for all actions in Spark.

   */

  def runJob[T, U: ClassTag](

      rdd: RDD[T],

      func: (TaskContext, Iterator[T]) => U,

      partitions: Seq[Int],

      resultHandler: (Int, U) => Unit): Unit = {

    //判断drive是否调用sc.stop停止程序

    if (stopped.get()) {

      throw new IllegalStateException("SparkContext has been shutdown")

    }

    val callSite = getCallSite

    val cleanedFunc = clean(func)

    logInfo("Starting job: " + callSite.shortForm)

    if (conf.getBoolean("spark.logLineage", false)) {

      logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString)

    }

    //cleanedFunc每个分区处理函数

    //partitions分区数

    //resultHandler每个分区结果的处理函数

    ~~dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)~~

    progressBar.foreach(_.finishAll())

    //请注意此处会执行检查点操作

    rdd.doCheckpoint()

  }

DAGScheduler

 /**

   * Run an action job on the given RDD and pass all the results to the resultHandler function as

   * they arrive.

   *

   * @param rdd target RDD to run tasks on

   * @param func a function to run on each partition of the RDD

   * @param partitions set of partitions to run on; some jobs may not want to compute on all

   *   partitions of the target RDD, e.g. for operations like first()

   * @param callSite where in the user program this job was called

   * @param resultHandler callback to pass each result to

   * @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name

   *

   * @throws Exception when the job fails

   */

  def runJob[T, U](

      rdd: RDD[T],

      func: (TaskContext, Iterator[T]) => U,

      partitions: Seq[Int],

      callSite: CallSite,

      resultHandler: (Int, U) => Unit,

      properties: Properties): Unit = {

    val start = System.nanoTime

    ~~val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties)~~

    // Note: Do not call Await.ready(future) because that calls `scala.concurrent.blocking`,

    // which causes concurrent SQL executions to fail if a fork-join pool is used. Note that

    // due to idiosyncrasies in Scala, `awaitPermission` is not actually used anywhere so it's

    // safe to pass in null here. For more detail, see SPARK-13747.

    val awaitPermission = null.asInstanceOf[scala.concurrent.CanAwait]

    waiter.completionFuture.ready(Duration.Inf)(awaitPermission)

    waiter.completionFuture.value.get match {

      case scala.util.Success(_) =>

        logInfo("Job %d finished: %s, took %f s".format

          (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))

      case scala.util.Failure(exception) =>

        logInfo("Job %d failed: %s, took %f s".format

          (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))

        // SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler.

        val callerStackTrace = Thread.currentThread().getStackTrace.tail

        exception.setStackTrace(exception.getStackTrace ++ callerStackTrace)

        throw exception

    }

  }

DAGScheduler:

  /**

   * Submit an action job to the scheduler.

   *

   * @param rdd target RDD to run tasks on

   * @param func a function to run on each partition of the RDD

   * @param partitions set of partitions to run on; some jobs may not want to compute on all

   *   partitions of the target RDD, e.g. for operations like first()

   * @param callSite where in the user program this job was called

   * @param resultHandler callback to pass each result to

   * @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name

   *

   * @return a JobWaiter object that can be used to block until the job finishes executing

   *         or can be used to cancel the job.

   *

   * @throws IllegalArgumentException when partitions ids are illegal

   */

  def submitJob[T, U](

      rdd: RDD[T],

      func: (TaskContext, Iterator[T]) => U,

      partitions: Seq[Int],

      callSite: CallSite,

      resultHandler: (Int, U) => Unit,

      properties: Properties): JobWaiter[U] = {

    // Check to make sure we are not launching a task on a partition that does not exist.

    val maxPartitions = rdd.partitions.length

    partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>

      throw new IllegalArgumentException(

        "Attempting to access a non-existent partition: " + p + ". " +

          "Total number of partitions: " + maxPartitions)

    }

    val jobId = nextJobId.getAndIncrement()

    if (partitions.size == 0) {

      // Return immediately if the job is running 0 tasks

      return new JobWaiter[U](this, jobId, 0, resultHandler)

    }

    assert(partitions.size > 0)

    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]

    //将resultHandler也就是一开始reduce中的mergeResult封装进JobWaiter

    ~~val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)~~

   //Put the event into the event queue. The event thread will process it later.

    eventProcessLoop.post(JobSubmitted(

      jobId, rdd, func2, partitions.toArray, callSite, waiter,

      SerializationUtils.clone(properties)))

    waiter

  }

private[scheduler] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)

当job被执行后，程序返回到DAGScheduler.runJob函数，显示成功或者失败的信息。

此时JobWaiter中执行了mergeResult函数，因为mergeResult是个闭包，

引用了RDD类中的JobResult，所以结果已经返回到RDD对象中。

一直返回到RDD：reduce中jobResult.getOrElse(throw new U nsupportedOperationException("empty collection"))

会看到最终返回了jobResult。

JobWaiter:

/**

 * An object that waits for a DAGScheduler job to complete. As tasks finish, it passes their

 * results to the given handler function.

 */

异步等待job完成，内部调用reduce中传入的mergeResult将每个Partition的结果合并，返回最终结果

spark2.1源码分析2：从SparkPi分析一个job的执行的更多相关文章

【Spark2.0源码学习】-1.概述
Spark作为当前主流的分布式计算框架,其高效性.通用性.易用性使其得到广泛的关注,本系列博客不会介绍其原理.安装与使用相关知识,将会从源码角度进行深度分析,理解其背后的设计精髓,以便后续 ...
老李推荐：第6章8节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-小结
老李推荐:第6章8节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-小结本章我们重点围绕处理网络过来的命令的MonkeySourceNetwork这个事 ...
老李推荐：第6章7节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-注入按键事件实例
老李推荐:第6章7节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-注入按键事件实例 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜 ...
老李推荐：第6章6节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-命令队列
老李推荐:第6章6节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-命令队列事件源在获得字串命令并把它翻译成对应的MonkeyEvent事件后,会把这些 ...
老李推荐：第6章4节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-翻译命令字串
老李推荐:第6章4节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-翻译命令字串 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自 ...
老李推荐：第6章5节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-事件
老李推荐:第6章5节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-事件从网络过来的命令字串需要解析翻译出来,有些命令会在翻译好后直接执行然后返回,但有 ...
老李推荐：第6章3节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-命令翻译类
老李推荐:第6章3节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-命令翻译类每个来自网络的字串命令都需要进行解析执行,只是有些是在解析的过程中直接执行 ...
老李推荐：第6章2节《MonkeyRunner源码剖析》Monkey原理分析-事件源-事件源概览-获取命令字串
老李推荐:第6章2节<MonkeyRunner源码剖析>Monkey原理分析-事件源-事件源概览-获取命令字串从上一节的描述可以知道,MonkeyRunner发送给Monkey的命令 ...
老李推荐：第5章7节《MonkeyRunner源码剖析》Monkey原理分析-启动运行: 循环获取并执行事件 - runMonkeyCycles
老李推荐:第5章7节<MonkeyRunner源码剖析>Monkey原理分析-启动运行: 循环获取并执行事件 - runMonkeyCycles poptest是国内唯一一家培养测试开 ...
老李推荐：第5章6节《MonkeyRunner源码剖析》Monkey原理分析-启动运行: 初始化事件源
老李推荐:第5章6节<MonkeyRunner源码剖析>Monkey原理分析-启动运行: 初始化事件源 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自动化测试 ...

随机推荐

cobbler批量化安装系统
C#自定义事件模拟风吹草摇摆
这是一个自定义事件的例子.C#.WinForm.Visual Studio 2017.在HoverTreeForm中画一块草地,上面有许多草(模拟).HewenqiTianyi类模拟天气,会引发“风” ...
Golang微服务:万精油NATS在Micro中的应用
NATS是一个Golang技术栈的MQ服务,类似NSQ,但NATS更轻量级.性能更好.不支持离线.支持同步/异步通信模型,非常好用. NATS在MICRO中有哪些应用 Transport 笔者以前开发 ...
python中sys模块之输入输出错误流
import sys sys.stdout.write("msg") # 控制台白色字体打印普通输出流 sys.stderr.write("msg") # ...
python基础（九）
一.私有 class DB: port = 3306 #类变量 def __init__(self): self.host = '127.0.0.1' self.__user = 'root' #实例 ...
原来你是这样的setTimeout
先上代码 console.log("start"); setTimeout(function(){ console.log("Hello"); },200); ...
vb.net
vb.net 教程: https://www.yiibai.com/vb.net/vb.net_overview.html vb.net 教程 https://www.w3cschool.cn/vb_ ...
Codeforces Round #350 (Div. 2) C. Cinema
Moscow is hosting a major international conference, which is attended by n scientists from different ...
初探OpenCL之Mac OS上的hello world示例
了解了深度学习的崛起,引起了目前OpenCL的需求,大致了解一下. 相关内容:http://blog.csdn.net/leonwei/article/details/8880012 本身OpenCL ...
数据库关闭，shutdown三种语句。
1.shutdown normal 正常方式关闭数据库. 2.shutdown immediate 立即方式关闭数据库. 在SVRMGRL中执行shutdown immedia ...

spark2.1源码分析2：从SparkPi分析一个job的执行

spark2.1源码分析2：从SparkPi分析一个job的执行的更多相关文章

随机推荐

热门专题