前言:本文是我学习Spark 源码与内部原理用,同时也希望能给新手一些帮助,入道不深,如有遗漏或错误的,请在原文评论或者发送至我的邮箱 tongzhenguotongzhenguo@gmail.com

摘要:

  1.作业调度核心——DAGScheduler

2.DAGScheduler类说明

    2.1DAGScheduler

    2.2ActiveJob

    2.3Stage

    2.4Task

  3.工作流程

    3.1划分Stage

    3.2生成Job,提交Stage

    3.3任务集的提交

    3.4任务作业完成状态的监控

    3.5任务结果的获取

内容:

1.作业调度核心——DAGScheduler

  用户代码都是基于RDD的一系列计算操作,实际运行时,这些计算操作是Lazy执行的,并不是所有的RDD操作都会触发Spark往Cluster上提交实际作业,基本上只有一些需要返回数据或者向外部输出的操作才会触发实际计算工作(Action算子),其它的变换操作基本上只是生成对应的RDD记录依赖关系(Transformation算子)。

   在这些RDD.Action操作中(如count,collect)会自动触发runJob提交作业,不需要用户显式的提交作业(这一部分可以看下Spark DAGSheduler生成Stage过程分析实验

  作业调度的两个主要入口是submitJob 和 runJob,两者的区别在于前者返回一个Jobwaiter对象,可以用在异步调用中,用来判断作业完成或者取消作业,runJob在内部调用submitJob,阻塞等待直到作业完成(或失败),以下是源码部分:

submitJob

  1. /**
  2. * Submit an action job to the scheduler.
  3. *
  4. * @param rdd target RDD to run tasks on
  5. * @param func a function to run on each partition of the RDD
  6. * @param partitions set of partitions to run on; some jobs may not want to compute on all
  7. * partitions of the target RDD, e.g. for operations like first()
  8. * @param callSite where in the user program this job was called
  9. * @param resultHandler callback to pass each result to
  10. * @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
  11. *
  12. * @return a JobWaiter object that can be used to block until the job finishes executing
  13. * or can be used to cancel the job.
  14. *
  15. * @throws IllegalArgumentException when partitions ids are illegal
  16. */
  17. def submitJob[T, U](
  18. rdd: RDD[T],
  19. func: (TaskContext, Iterator[T]) => U,
  20. partitions: Seq[Int],
  21. callSite: CallSite,
  22. resultHandler: (Int, U) => Unit,
  23. properties: Properties): JobWaiter[U] = {
  24. // Check to make sure we are not launching a task on a partition that does not exist.
  25. val maxPartitions = rdd.partitions.length
  26. partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
  27. throw new IllegalArgumentException(
  28. "Attempting to access a non-existent partition: " + p + ". " +
  29. "Total number of partitions: " + maxPartitions)
  30. }
  31.  
  32. val jobId = nextJobId.getAndIncrement()
  33. if (partitions.size == 0) {
  34. // Return immediately if the job is running 0 tasks
  35. return new JobWaiter[U](this, jobId, 0, resultHandler)
  36. }
  37.  
  38. assert(partitions.size > 0)
  39. val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
  40. val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
  41. eventProcessLoop.post(JobSubmitted(
  42. jobId, rdd, func2, partitions.toArray, callSite, waiter,
  43. SerializationUtils.clone(properties)))
  44. waiter
  45. }

runJob  

  1. /**
  2. * Run an action job on the given RDD and pass all the results to the resultHandler function as
  3. * they arrive.
  4. *
  5. * @param rdd target RDD to run tasks on
  6. * @param func a function to run on each partition of the RDD
  7. * @param partitions set of partitions to run on; some jobs may not want to compute on all
  8. * partitions of the target RDD, e.g. for operations like first()
  9. * @param callSite where in the user program this job was called
  10. * @param resultHandler callback to pass each result to
  11. * @param properties scheduler properties to attach to this job, e.g. fair scheduler pool name
  12. *
  13. * @throws Exception when the job fails
  14. */
  15. def runJob[T, U](
  16. rdd: RDD[T],
  17. func: (TaskContext, Iterator[T]) => U,
  18. partitions: Seq[Int],
  19. callSite: CallSite,
  20. resultHandler: (Int, U) => Unit,
  21. properties: Properties): Unit = {
  22. val start = System.nanoTime
  23. val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties)
  24.  
  25. val awaitPermission = null.asInstanceOf[scala.concurrent.CanAwait]
  26. waiter.completionFuture.ready(Duration.Inf)(awaitPermission)
  27. waiter.completionFuture.value.get match {
  28. case scala.util.Success(_) =>
  29. logInfo("Job %d finished: %s, took %f s".format
  30. (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
  31. case scala.util.Failure(exception) =>
  32. logInfo("Job %d failed: %s, took %f s".format
  33. (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
  34. // SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler.
  35. val callerStackTrace = Thread.currentThread().getStackTrace.tail
  36. exception.setStackTrace(exception.getStackTrace ++ callerStackTrace)
  37. throw exception
  38. }
  39. }

  DAGScheduler最重要的任务之一即制定基于Stage的逻辑调度。先构建Stage之间的DAG图,然后将Stage提交给TaskScheduler

  1. /**
  2. * The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of
  3. * stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a
  4. * minimal schedule to run the job. It then submits stages as TaskSets to an underlying
  5. * TaskScheduler implementation that runs them on the cluster.

2.DAGScheduler类说明

  问:DAGScheduler是什么时候生成的?

  答:DAGScheduler在SparkContext初始化过程中实例化,一个SparkContext对应一个DAGScheduler

下面提到一些相关的概念:

ActiveJob: Jobs 是以ActiveJob类代表的,ActiveJob 可以根据finalStage区分为两种:a result job(对应ResultStage)或者a map-stage job(对应ShuffleMapStage,主要用在查询计划上)。以下是ActiveJob类:

  1. /*
  2. * Jobs 是以ActiveJob类代表的,ActiveJob 可以根据finalStage区分为两种:
  3. * a result job(对应ResultStage)或者a map-stage job(对应ShuffleMapStage,主要用在查询计划上)。
  4. */
  5. private[spark] class ActiveJob(
  6. val jobId: Int,
  7. val finalStage: Stage,
  8. val callSite: CallSite,
  9. val listener: JobListener,
  10. val properties: Properties) {
  11.  
  12. /**
  13. * Number of partitions we need to compute for this job. Note that result stages may not need
  14. * to compute all partitions in their target RDD, for actions like first() and lookup().
  15. */
  16. val numPartitions = finalStage match {
  17. case r: ResultStage => r.partitions.length
  18. case m: ShuffleMapStage => m.rdd.partitions.length
  19. }
  20.  
  21. /** Which partitions of the stage have finished */
  22. val finished = Array.fill[Boolean](numPartitions)(false)
  23.  
  24. var numFinished = 0
  25. }

Stage:一个Stage就是一组并行的task,各个stage之间以Shuffle为边界进行划分;Stage 也相应划分为两种:a shuffle map stage和 a result stage,以下是Stage类:

  1. /*
  2. * 一个Stage就是一组并行的task,各个stage之间以Shuffle为边界进行划分;
  3. * Stage 也相应划分为两种:
  4. * a shuffle map stage
  5. * a result stage
  6. */
  7.  
  8. private[scheduler] abstract class Stage(
  9. val id: Int,
  10. val rdd: RDD[_],
  11. val numTasks: Int,
  12. val parents: List[Stage],
  13. val firstJobId: Int,
  14. val callSite: CallSite)
  15. extends Logging {
  16.  
  17. val numPartitions = rdd.partitions.length
  18.  
  19. /** Set of jobs that this stage belongs to. */
  20. val jobIds = new HashSet[Int]
  21.  
  22. val pendingPartitions = new HashSet[Int]
  23.  
  24. /** The ID to use for the next new attempt for this stage. */
  25. private var nextAttemptId: Int = 0
  26.  
  27. val name: String = callSite.shortForm
  28. val details: String = callSite.longForm
  29.  
  30. /**
  31. * Pointer to the [StageInfo] object for the most recent attempt. This needs to be initialized
  32. * here, before any attempts have actually been created, because the DAGScheduler uses this
  33. * StageInfo to tell SparkListeners when a job starts (which happens before any stage attempts
  34. * have been created).
  35. */
  36. private var _latestInfo: StageInfo = StageInfo.fromStage(this, nextAttemptId)
  37.  
  38. /**
  39. * Set of stage attempt IDs that have failed with a FetchFailure. We keep track of these
  40. * failures in order to avoid endless retries if a stage keeps failing with a FetchFailure.
  41. * We keep track of each attempt ID that has failed to avoid recording duplicate failures if
  42. * multiple tasks from the same stage attempt fail (SPARK-5945).
  43. */
  44. private val fetchFailedAttemptIds = new HashSet[Int]
  45.  
  46. private[scheduler] def clearFailures() : Unit = {
  47. fetchFailedAttemptIds.clear()
  48. }
  49.  
  50. /**
  51. * Check whether we should abort the failedStage due to multiple consecutive fetch failures.
  52. *
  53. * This method updates the running set of failed stage attempts and returns
  54. * true if the number of failures exceeds the allowable number of failures.
  55. */
  56. private[scheduler] def failedOnFetchAndShouldAbort(stageAttemptId: Int): Boolean = {
  57. fetchFailedAttemptIds.add(stageAttemptId)
  58. fetchFailedAttemptIds.size >= Stage.MAX_CONSECUTIVE_FETCH_FAILURES
  59. }
  60.  
  61. /** Creates a new attempt for this stage by creating a new StageInfo with a new attempt ID. */
  62. def makeNewStageAttempt(
  63. numPartitionsToCompute: Int,
  64. taskLocalityPreferences: Seq[Seq[TaskLocation]] = Seq.empty): Unit = {
  65. val metrics = new TaskMetrics
  66. metrics.register(rdd.sparkContext)
  67. _latestInfo = StageInfo.fromStage(
  68. this, nextAttemptId, Some(numPartitionsToCompute), metrics, taskLocalityPreferences)
  69. nextAttemptId += 1
  70. }
  71.  
  72. /** Returns the StageInfo for the most recent attempt for this stage. */
  73. def latestInfo: StageInfo = _latestInfo
  74.  
  75. override final def hashCode(): Int = id
  76.  
  77. override final def equals(other: Any): Boolean = other match {
  78. case stage: Stage => stage != null && stage.id == id
  79. case _ => false
  80. }
  81.  
  82. /** Returns the sequence of partition ids that are missing (i.e. needs to be computed). */
  83. def findMissingPartitions(): Seq[Int]
  84. }
  85.  
  86. private[scheduler] object Stage {
  87. // The number of consecutive failures allowed before a stage is aborted
  88. val MAX_CONSECUTIVE_FETCH_FAILURES = 4
  89. }

Task:也相应对应两个类:ShuffleMapTask和ResultTask, 其中前者执行任务并将输出写入分区;后者执行任务将输出发送到驱动程序中(Driver Application)(以后有时间分析任务执行的时候再分析源码吧)

其他相关说明:

  1. *  
  2. * - Cache tracking: the DAGScheduler figures out which RDDs are cached to avoid recomputing them
  3. * and likewise remembers which shuffle map stages have already produced output files to avoid
  4. * redoing the map side of a shuffle.
  5. *
  1. * - Preferred locations: the DAGScheduler also computes where to run each task in a stage based
  2. * on the preferred locations of its underlying RDDs, or the location of cached or shuffle data.
  3. *  
  4. * - Cleanup: all data structures are cleared when the running jobs that depend on them finish,
  5. * to prevent memory leaks in a long-running application.
    *

  DAGScheduler内部维护了各种task / stage / job之间的映射关系表,值得一提的是这里根据执行情况,stages的几种划分,有助于之后阅读submitStages方法。

  1. private[scheduler] val nextJobId = new AtomicInteger(0)
  2. private[scheduler] def numTotalJobs: Int = nextJobId.get()
  3. private val nextStageId = new AtomicInteger(0)
  4.  
  5. private[scheduler] val jobIdToStageIds = new HashMap[Int, HashSet[Int]]
  6. private[scheduler] val stageIdToStage = new HashMap[Int, Stage]
  7. private[scheduler] val shuffleToMapStage = new HashMap[Int, ShuffleMapStage]
  8. private[scheduler] val jobIdToActiveJob = new HashMap[Int, ActiveJob]
  9.  
  10. // Stages we need to run whose parents aren't done
  11. private[scheduler] val waitingStages = new HashSet[Stage]
  12.  
  13. // Stages we are running right now
  14. private[scheduler] val runningStages = new HashSet[Stage]
  15.  
  16. // Stages that must be resubmitted due to fetch failures
  17. private[scheduler] val failedStages = new HashSet[Stage]
  18.  
  19. private[scheduler] val activeJobs = new HashSet[ActiveJob]  

3.工作流程

工作流程图:

3.1 划分Stage

  Spark的stages 是以shuffle为边界切分RDD图来创建的。具有窄依赖(例:map(),filter())的操作会在对应Stage的一系列任务中管道式的运行,但是具有宽依赖的操作则需要多个Stage.最后所有的Stage之间将只有shuffle依赖关系。

  实际上这些操作发生在RDD.compute(),在各个RDD的实现上,比如MappedRDD,FilteredRDD等

  当某个操作触发计算,向DAGScheduler提交作业时,DAGScheduler需要从RDD依赖链最末端的RDD出发,遍历整个RDD依赖链,划分Stage任务阶段,并决定各个Stage之间的依赖关系。Stage的划分是以ShuffleDependency为依据的,也就是说当某个RDD的运算需要将数据进行Shuffle时,这个包含了Shuffle依赖关系的RDD将被用来作为输入信息,构建一个新的Stage,由此为依据划分Stage,可以确保有依赖关系的数据能够按照正确的顺序得到处理和运算。这部分做了一个简单的实验:Spark DAGSheduler生成Stage过程分析实验

  以GroupByKey操作为例,该操作返回的结果实际上是一个ShuffleRDD,当DAGScheduler遍历到这个ShuffleRDD的时候,因为其Dependency是一个ShuffleDependency,于是这个ShuffleRDD的父RDD以及shuffleDependency等对象就被用来构建一个新的Stage,这个Stage的输出结果的分区方式,则由ShuffleDependency中的Partitioner对象来决定。

  可以看到,尽管划分和构建Stage的依据是ShuffleDependency,对应的RDD也就是这里的ShuffleRDD,但是这个Stage所处理的数据是从这个shuffleRDD的父RDD开始计算的,只是最终的输出结果的位置信息参考了ShuffleRDD返回的ShuffleDependency里所包含的内容。而shuffleRDD本身的运算操作(其实就是一个获取shuffle结果的过程),是在下一个Stage里进行的。

  贴一张图:

  

3.2 生成Job,提交Stage

  上一个步骤得到一个或多个有依赖关系的Stage,其中直接触发Job的RDD所关联的Stage作为FinalStage生成一个Job实例,这两者的关系进一步存储在resultStageToJob映射表中,用于在该Stage全部完成时做一些后续处理,如报告状态,清理Job相关数据等。具体提交一个Stage时,首先判断该Stage所依赖的父Stage的结果是否可用,如果所有父Stage的结果都可用,则提交该Stage,如果有任何一个父Stage的结果不可用,则迭代尝试提交父Stage。 所有迭代过程中由于所依赖Stage的结果不可用而没有提交成功的Stage都被放到waitingStages列表中等待将来被提交

  什么时候waitingStages中的Stage会被重新提交呢?当一个属于中间过程Stage的任务(这种类型的任务所对应的类为ShuffleMapTask)完成以后,DAGScheduler会检查对应的Stage的所有任务是否都完成了,如果是都完成了,则DAGScheduler将重新扫描一次waitingStages中的所有Stage,检查他们是否还有任何依赖的Stage没有完成,如果没有就可以提交该Stage。

  此外每当完成一次DAGScheduler的事件循环以后,也会触发一次从等待(waitingStages)和失败列表(failedStages)中扫描并提交就绪Stage的调用过程

  下面是submitStage的代码:

  

3.3 任务集的提交

  每个Stage的提交,最终是转换成一个TaskSet任务集的提交,DAGScheduler通过TaskScheduler接口提交TaskSet,这个TaskSet最终会触发TaskScheduler构建一个TaskSetManager的实例来管理这个TaskSet的生命周期,对于DAGScheduler来说提交Stage的工作到此就完成了。而TaskScheduler的具体实现则会在得到计算资源的时候,进一步通过TaskSetManager调度具体的Task到对应的Executor节点上进行运算

3.4 任务作业完成状态的监控

  要保证相互依赖的job/stage能够得到顺利的调度执行,DAGScheduler就必然需要监控当前Job / Stage乃至Task的完成情况。这是通过对外(主要是对TaskScheduler)暴露一系列的回调函数来实现的,对于TaskScheduler来说,这些回调函数主要包括任务的开始结束失败,任务集的失败,DAGScheduler根据这些Task的生命周期信息进一步维护Job和Stage的状态信息。

  1. private val messageScheduler =
  2. ThreadUtils.newDaemonSingleThreadScheduledExecutor("dag-scheduler-message")
  3.  
  4. private[scheduler] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)
  5. taskScheduler.setDAGScheduler(this)
  6.  
  7. /**
  8. * Called by the TaskSetManager to report task's starting.
  9. */
  10. def taskStarted(task: Task[_], taskInfo: TaskInfo) {
  11. eventProcessLoop.post(BeginEvent(task, taskInfo))
  12. }

  问:DAGScheduler内部是如何运行的?如何循环的?

  答:DAGScheduler的事件循环逻辑基于Akka Actor的消息传递机制来构建,在DAGScheduler的taskStarted函数中创建了一个eventProcessLoop用来处理各种DAGSchedulerEvent,这些事件包括作业的提交,任务状态的变化,监控等等 

  这里跟读一下DAGSchedulerEventProcessLoop,来看下这个类是如何处理消息事件(DAGSchedulerEvent)的

  1. private[scheduler] class DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler)
  2. extends EventLoop[DAGSchedulerEvent]("dag-scheduler-event-loop") with Logging {
  3.  
  4. private[this] val timer = dagScheduler.metricsSource.messageProcessingTimer
  5.  
  6. /**
  7. * The main event loop of the DAG scheduler.
  8. */
  9. override def onReceive(event: DAGSchedulerEvent): Unit = {
  10. val timerContext = timer.time()
  11. try {
  12. doOnReceive(event)
  13. } finally {
  14. timerContext.stop()
  15. }
  16. }
  17.  
  18. private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
  19. case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
  20. dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)
  21.  
  22. case MapStageSubmitted(jobId, dependency, callSite, listener, properties) =>
  23. dagScheduler.handleMapStageSubmitted(jobId, dependency, callSite, listener, properties)
  24.  
  25. case StageCancelled(stageId) =>
  26. dagScheduler.handleStageCancellation(stageId)
  27.  
  28. case JobCancelled(jobId) =>
  29. dagScheduler.handleJobCancellation(jobId)
  30.  
  31. case JobGroupCancelled(groupId) =>
  32. dagScheduler.handleJobGroupCancelled(groupId)
  33.  
  34. case AllJobsCancelled =>
  35. dagScheduler.doCancelAllJobs()
  36.  
  37. case ExecutorAdded(execId, host) =>
  38. dagScheduler.handleExecutorAdded(execId, host)
  39.  
  40. case ExecutorLost(execId) =>
  41. dagScheduler.handleExecutorLost(execId, fetchFailed = false)
  42.  
  43. case BeginEvent(task, taskInfo) =>
  44. dagScheduler.handleBeginEvent(task, taskInfo)
  45.  
  46. case GettingResultEvent(taskInfo) =>
  47. dagScheduler.handleGetTaskResult(taskInfo)
  48.  
  49. case completion: CompletionEvent =>
  50. dagScheduler.handleTaskCompletion(completion)
  51.  
  52. case TaskSetFailed(taskSet, reason, exception) =>
  53. dagScheduler.handleTaskSetFailed(taskSet, reason, exception)
  54.  
  55. case ResubmitFailedStages =>
  56. dagScheduler.resubmitFailedStages()
  57. }
  58.  
  59. override def onError(e: Throwable): Unit = {
  60. logError("DAGSchedulerEventProcessLoop failed; shutting down SparkContext", e)
  61. try {
  62. dagScheduler.doCancelAllJobs()
  63. } catch {
  64. case t: Throwable => logError("DAGScheduler failed to cancel all jobs.", t)
  65. }
  66. dagScheduler.sc.stop()
  67. }
  68.  
  69. override def onStop(): Unit = {
  70. // Cancel any active jobs in postStop hook
  71. dagScheduler.cleanUpAfterSchedulerStop()
  72. }
  73. }

  此外TaskScheduler还可以通过回调函数通知DAGScheduler具体的Executor的生命状态,如果某一个Executor崩溃了,或者由于任何原因与Driver失去联系了,则对应的Stage的shuffleMapTask的输出结果也将被标志为不可用,这也将导致对应Stage状态的变更,进而影响相关Job的状态,再进一步可能触发对应Stage的重新提交来重新计算获取相关的数据。

3.5 任务结果的获取

  一个具体的任务在Executor中执行完毕以后,其结果需要以某种形式返回给DAGScheduler,根据任务类型的不同,任务的结果的返回方式也不同

  对于FinalStage所对应的任务(对应的类为ResultTask)返回给DAGScheduler的是运算结果本身,而对于ShuffleMapTask,返回给DAGScheduler的是一个MapStatus对象,MapStatus对象管理了ShuffleMapTask的运算输出结果在BlockManager里的相关存储信息,而非结果本身,这些存储位置信息将作为下一个Stage的任务的获取输入数据的依据

  而根据任务结果的大小的不同,ResultTask返回的结果又分为两类,如果结果足够小,则直接放在DirectTaskResult对象内,如果超过特定尺寸(默认约10MB)则在Executor端会将DirectTaskResult先序列化,再把序列化的结果作为一个Block存放在BlockManager里,而后将BlockManager返回的BlockID放在IndirectTaskResult对象中返回给TaskScheduler,TaskScheduler进而调用TaskResultGetter将IndirectTaskResult中的BlockID取出并通过BlockManager最终取得对应的DirectTaskResult。当然从DAGScheduler的角度来说,这些过程对它来说是透明的,它所获得的都是任务的实际运算结果。

  1. // This is a var so that we can reset it for testing purposes.
  2. private[spark] var taskResultGetter = new TaskResultGetter(sc.env, this)

ResultSetGetter 的enqueueSuccessfulTask 方法:

  1. def enqueueSuccessfulTask(
  2. taskSetManager: TaskSetManager,
  3. tid: Long,
  4. serializedData: ByteBuffer): Unit = {
  5. getTaskResultExecutor.execute(new Runnable {
  6. override def run(): Unit = Utils.logUncaughtExceptions {
  7. try {
  8. val (result, size) = serializer.get().deserialize[TaskResult[_]](serializedData) match {
  9. /*
  10. * 根据任务结果的大小的不同,ResultTask返回的结果又分为两类:DirectTaskResult,IndirectTaskResult
  11. * 1.如果结果足够小,则直接放在DirectTaskResult对象内
  12. */
  13. case directResult: DirectTaskResult[_] =>
  14. if (!taskSetManager.canFetchMoreResults(serializedData.limit())) {
  15. return
  16. }
  17. // deserialize "value" without holding any lock so that it won't block other threads.
  18. // We should call it here, so that when it's called again in
  19. // "TaskSetManager.handleSuccessfulTask", it does not need to deserialize the value.
  20. directResult.value()
  21. (directResult, serializedData.limit())
  22.  
  23. /**
  24. * 如果超过特定尺寸(默认约10MB)则在Executor端会将DirectTaskResult先序列化,
  25. * 再把序列化的结果作为一个Block存放在BlockManager里,
  26. * 而后将BlockManager返回的BlockID放在IndirectTaskResult对象中返回给TaskScheduler,
  27. * TaskScheduler进而调用TaskResultGetter将IndirectTaskResult中的BlockID取出并通过BlockManager最终取得对应的DirectTaskResult。
  28. *
  29. */
  30. case IndirectTaskResult(blockId, size) =>
  31. if (!taskSetManager.canFetchMoreResults(size)) {
  32. // dropped by executor if size is larger than maxResultSize
  33. sparkEnv.blockManager.master.removeBlock(blockId)
  34. return
  35. }
  36. logDebug("Fetching indirect task result for TID %s".format(tid))
  37. scheduler.handleTaskGettingResult(taskSetManager, tid)
  38. val serializedTaskResult = sparkEnv.blockManager.getRemoteBytes(blockId)
  39. if (!serializedTaskResult.isDefined) {
  40. /* We won't be able to get the task result if the machine that ran the task failed
  41. * between when the task ended and when we tried to fetch the result, or if the
  42. * block manager had to flush the result. */
  43. scheduler.handleFailedTask(
  44. taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
  45. return
  46. }
  47. val deserializedResult = serializer.get().deserialize[DirectTaskResult[_]](
  48. serializedTaskResult.get.toByteBuffer)
  49. sparkEnv.blockManager.master.removeBlock(blockId)
  50. (deserializedResult, size)
  51. }
  52.  
  53. // Set the task result size in the accumulator updates received from the executors.
  54. // We need to do this here on the driver because if we did this on the executors then
  55. // we would have to serialize the result again after updating the size.
  56. result.accumUpdates = result.accumUpdates.map { a =>
  57. if (a.name == Some(InternalAccumulator.RESULT_SIZE)) {
  58. val acc = a.asInstanceOf[LongAccumulator]
  59. assert(acc.sum == 0L, "task result size should not have been set on the executors")
  60. acc.setValue(size.toLong)
  61. acc
  62. } else {
  63. a
  64. }
  65. }
  66.  
  67. scheduler.handleSuccessfulTask(taskSetManager, tid, result)
  68. } catch {
  69. case cnf: ClassNotFoundException =>
  70. val loader = Thread.currentThread.getContextClassLoader
  71. taskSetManager.abort("ClassNotFound with classloader: " + loader)
  72. // Matching NonFatal so we don't catch the ControlThrowable from the "return" above.
  73. case NonFatal(ex) =>
  74. logError("Exception while getting task result", ex)
  75. taskSetManager.abort("Exception while getting task result: %s".format(ex))
  76. }
  77. }
  78. })
  79. }

Spark核心作业调度和任务调度之DAGScheduler源码的更多相关文章

  1. 17、stage划分算法原理及DAGScheduler源码分析

    一.stage划分算法原理 1.图解 二.DAGScheduler源码分析 1. ###org.apache.spark/SparkContext.scala // 调用SparkContext,之前 ...

  2. Spark jdbc postgresql数据库连接和写入操作源码解读

    概述:Spark postgresql jdbc 数据库连接和写入操作源码解读,详细记录了SparkSQL对数据库的操作,通过java程序,在本地开发和运行.整体为,Spark建立数据库连接,读取数据 ...

  3. Dream_Spark-----Spark 定制版:005~贯通Spark Streaming流计算框架的运行源码

    Spark 定制版:005~贯通Spark Streaming流计算框架的运行源码   本讲内容: a. 在线动态计算分类最热门商品案例回顾与演示 b. 基于案例贯通Spark Streaming的运 ...

  4. Spark Streaming updateStateByKey案例实战和内幕源码解密

    本节课程主要分二个部分: 一.Spark Streaming updateStateByKey案例实战二.Spark Streaming updateStateByKey源码解密 第一部分: upda ...

  5. 基于案例贯通 Spark Streaming 流计算框架的运行源码

    本期内容 : Spark Streaming+Spark SQL案例展示 基于案例贯穿Spark Streaming的运行源码 一. 案例代码阐述 : 在线动态计算电商中不同类别中最热门的商品排名,例 ...

  6. Spark环境搭建(六)-----------sprk源码编译

    想要搭建自己的Hadoop和spark集群,尤其是在生产环境中,下载官网提供的安装包远远不够的,必须要自己源码编译spark才行. 环境准备: 1,Maven环境搭建,版本Apache Maven 3 ...

  7. Spark(十五)SparkCore的源码解读

    一.启动脚本分析 独立部署模式下,主要由master和slaves组成,master可以利用zk实现高可用性,其driver,work,app等信息可以持久化到zk上:slaves由一台至多台主机构成 ...

  8. 贯通Spark Streaming流计算框架的运行源码

    本章节内容: 一.在线动态计算分类最热门商品案例回顾 二.基于案例贯通Spark Streaming的运行源码 先看代码(源码场景:用户.用户的商品.商品的点击量排名,按商品.其点击量排名前三): p ...

  9. 66、Spark Streaming:数据处理原理剖析与源码分析(block与batch关系透彻解析)

    一.数据处理原理剖析 每隔我们设置的batch interval 的time,就去找ReceiverTracker,将其中的,从上次划分batch的时间,到目前为止的这个batch interval ...

随机推荐

  1. CSS 3学习——animation动画

    以下内容根据官方文档翻译以及自己的理解整理. 1.  介绍 本方案介绍动画(animations).通过动画,开发者可以将CSS属性值的变化指定为一个随时间变化的关键帧(keyframes)的集合.在 ...

  2. SignalR快速入门 ~ 仿QQ即时聊天,消息推送,单聊,群聊,多群公聊(基础=》提升)

     SignalR快速入门 ~ 仿QQ即时聊天,消息推送,单聊,群聊,多群公聊(基础=>提升,5个Demo贯彻全篇,感兴趣的玩才是真的学) 官方demo:http://www.asp.net/si ...

  3. 120项改进:开源超级爬虫Hawk 2.0 重磅发布!

    沙漠君在历时半年,修改无数bug,更新一票新功能后,在今天隆重推出最新改进的超级爬虫Hawk 2.0! 啥?你不知道Hawk干吗用的? 这是采集数据的挖掘机,网络猎杀的重狙!半年多以前,沙漠君写了一篇 ...

  4. Asp.Net Core 项目实战之权限管理系统(4) 依赖注入、仓储、服务的多项目分层实现

    0 Asp.Net Core 项目实战之权限管理系统(0) 无中生有 1 Asp.Net Core 项目实战之权限管理系统(1) 使用AdminLTE搭建前端 2 Asp.Net Core 项目实战之 ...

  5. 【C#公共帮助类】 ToolsHelper帮助类

    这个帮助类,目前我们只用到了两个,我就先更新这两个,后面有用到的,我会继续更新这个Helper帮助类 在Tools.cs中 有很多方法 跟Utils里是重复的,而且Utils里的方法更加新一点,大家可 ...

  6. [转载]一个标准java程序员的进阶过程

    第一阶段:Java程序员 技术名称 内                 容 说明 Java语法基础 基本语法.数组.类.继承.多态.抽象类.接口.object对象.常用类(Math\Arrarys\S ...

  7. Flex 布局教程:实例篇

    该教程整理自 阮一峰Flexible教程 今天介绍常见布局的Flex写法.你会看到,不管是什么布局,Flex往往都可以几行命令搞定. 我的主要参考资料是Landon Schropp的文章和Solved ...

  8. 小程序用户反馈 - HotApp小程序统计仿微信聊天用户反馈组件,开源

    用户反馈是小程序开发必要的一个功能,但是和自己核心业务没关系,主要是产品运营方便收集用户的对产品的反馈.HotApp推出了用户反馈的组件,方便大家直接集成使用 源码下载地址: https://gith ...

  9. 在开源中国(oschina)git中新建标签(tags)

    我今天提交代码到主干上面,本来想打个标签(tags)的. 因为我以前新建过标签(tags),但是我现在新建的时候不知道入库在哪了.怎么找也找不到了. 从网上找资料也没有,找客服没有人理我,看到一个交流 ...

  10. 【一起学OpenFoam】01 OpenFoam的优势

    CFD技术发展到今天,已经超过了大半个世纪了,已经涌现出非常多的CFD软件可供人们使用.通用商业CFD软件譬如Fluent.CFX.Star CCM+等在工业上得到了广泛的应用,另外一些专用的软件(如 ...