spark 笔记 7: DAGScheduler

在前面的sparkContex和RDD都可以看到，真正的计算工作都是同过调用DAGScheduler的runjob方法来实现的。这是一个很重要的类。在看这个类实现之前，需要对actor模式有一点了解：http://en.wikipedia.org/wiki/Actor_model http://www.slideshare.net/YungLinHo/introduction-to-actor-model-and-akka 粗略知道actor模式怎么实现就可以了。另外，应该先看看DAG相关的概念和论文 http://en.wikipedia.org/wiki/Directed_acyclic_graph http://www.netlib.org/utk/people/JackDongarra/PAPERS/DAGuE_technical_report.pdf

===========================Job 提交流程======================================

DAGSchedulerEventProcessActor::submitJob --每个action都会调用到一个submitJob的操作

-> send: JobSubmitted --它发送一个消息给DAGScheduler（因为提交job的机器可能不是master?)

-> handleJobSubmitted --DAGScheduler处理接收到的消息

-> newStage --创建一个stage

-> new ActiveJob ---找到一个active状态的

-> [runLocally]} --如果是简单的job，直接在本地执行。

localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1

->runLocally(job) --don't block the DAGScheduler event loop or other concurrent jobs

->runLocallyWithinThread(job) --创建新的线程执行本地job，不阻塞DAG进程

->TaskContext(job.finalStage.id, job.partitions(0), 0, runningLocally = true)

->result = job.func(taskContext, rdd.iterator(split, taskContext)) 执行job

->job.listener.taskSucceeded(0, result) --通知监听者job结果

->listenerBus.post(SparkListenerJobEnd(job.jobId, jobResult)) --通知job结束

->submitStage(finalStage) -- Submits stage, but first recursively submits any missing parents递归提交

-> activeJobForStage --Finds the earliest-created active job that needs the stage。在jobIdToActiveJob找

-> getMissingParentStages --如果一个stage依赖于一个shuffle stage，这个RDD就是missing的

->waitingForVisit.push(stage.rdd)

->waitingForVisit.pop()

->getShuffleMapStage

->registerShuffleDependencies 将所有父节点的shuffle注册到shuffleToMapStage和mapOutputTracker

->getAncestorShuffleDependencies :返回一个栈，里面装着包含shuffle的父依赖节点；

->newOrUsedStage --给RDD创建shuffle stage；如果存在，使用老的loc覆盖新的loc

->mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId) or

->mapOutputTracker.registerShuffle(shuffleDep.shuffleId,rdd.partitions.size)

->shuffleToMapStage(currentShufDep.shuffleId) = stage --加入DAG的hash属性中

->newOrUsedStage -- 给当前RDD创建shuffle stage

->shuffleToMapStage(shuffleDep.shuffleId) = stage --加入DAG的hash属性中

->NarrowDependency ->waitingForVisit.push(narrowDep.rdd) --narrowDeps的不分析，直接加入栈去找它的父节点。

-> submitMissingTasks --Called when stage's parents are available and we can now do its task。这个stage没有依赖缺失了。

-> stage.pendingTasks.clear() 清空正在执行的task。

-> partitionsToCompute = ? --First figure out the indexes of partition ids to compute.

找出需要执行的分片。shuffle要执行更多分片

->runningStages += stage 更新running记录

->listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties)) --通知应用程序stage被提交。

->Broadcasted binary for the task, used to dispatch tasks to executors. serialized copy of the RDD and for each task,

which means each task gets a different copy of the RDD, This is necessary in Hadoop

where the JobConf/Configuration object is not thread-safe

->// For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).

->// For ResultTask, serialize and broadcast (rdd, func).

->new ShuffleMapTask(stage.id, taskBinary, part, locs) 创建task

->new ResultTask(stage.id, taskBinary, part, locs, id)

-> Preemptively serialize a task to make sure it can be serialized. For catch exception.

->stage.pendingTasks ++= tasks

->taskScheduler.submitTasks --将task提交到taskScheduler

-> submitStage(parent) --(递归）如果能找到一个stage是missing状态，那就将它的依赖节点submit

======================end=========================================

每个job都有一个DAG调度器，跟踪RDD和Stage的实例化，并寻找一个最优（？）的调度来执行这个job。它提交一个taskSet给TaskScheduler在集群上执行task。

/**
 * The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of
 * stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a
 * minimal schedule to run the job. It then submits stages as TaskSets to an underlying
 * TaskScheduler implementation that runs them on the cluster.
 *
 * In addition to coming up with a DAG of stages, this class also determines the preferred
 * locations to run each task on, based on the current cache status, and passes these to the
 * low-level TaskScheduler. Furthermore, it handles failures due to shuffle output files being
 * lost, in which case old stages may need to be resubmitted. Failures *within* a stage that are
 * not caused by shuffle file loss are handled by the TaskScheduler, which will retry each task
 * a small number of times before cancelling the whole stage.
 *
 */

package org.apache.spark.scheduler
private[spark]
class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = SystemClock)
  extends Logging {

状态机（actor 消息响应）：

private[scheduler] class DAGSchedulerEventProcessActor(dagScheduler: DAGScheduler)
  extends Actor with Logging {

  override def preStart() {
    // set DAGScheduler for taskScheduler to ensure eventProcessActor is always
    // valid when the messages arrive
    dagScheduler.taskScheduler.setDAGScheduler(dagScheduler)
  }

  /**
   * The main event loop of the DAG scheduler.
   */
  def receive = {
    case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>
      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,
        listener, properties)

    case StageCancelled(stageId) =>
      dagScheduler.handleStageCancellation(stageId)

    case JobCancelled(jobId) =>
      dagScheduler.handleJobCancellation(jobId)

    case JobGroupCancelled(groupId) =>
      dagScheduler.handleJobGroupCancelled(groupId)

    case AllJobsCancelled =>
      dagScheduler.doCancelAllJobs()

    case ExecutorAdded(execId, host) =>
      dagScheduler.handleExecutorAdded(execId, host)

    case ExecutorLost(execId) =>
      dagScheduler.handleExecutorLost(execId)

    case BeginEvent(task, taskInfo) =>
      dagScheduler.handleBeginEvent(task, taskInfo)

    case GettingResultEvent(taskInfo) =>
      dagScheduler.handleGetTaskResult(taskInfo)

    case completion @ CompletionEvent(task, reason, _, _, taskInfo, taskMetrics) =>
      dagScheduler.handleTaskCompletion(completion)

    case TaskSetFailed(taskSet, reason) =>
      dagScheduler.handleTaskSetFailed(taskSet, reason)

    case ResubmitFailedStages =>
      dagScheduler.resubmitFailedStages()
  }

重要的属性：

private val nextStageId = new AtomicInteger(0)

private[scheduler] val nextJobId = new AtomicInteger(0)
private[scheduler] val jobIdToStageIds = new HashMap[Int, HashSet[Int]]
private[scheduler] val stageIdToStage = new HashMap[Int, Stage]
private[scheduler] val shuffleToMapStage = new HashMap[Int, Stage]
private[scheduler] val jobIdToActiveJob = new HashMap[Int, ActiveJob]

// Stages we need to run whose parents aren't done
private[scheduler] val waitingStages = new HashSet[Stage]
// Stages we are running right now
private[scheduler] val runningStages = new HashSet[Stage]
// Stages that must be resubmitted due to fetch failures
private[scheduler] val failedStages = new HashSet[Stage]
private[scheduler] val activeJobs = new HashSet[ActiveJob]
// Contains the locations that each RDD's partitions are cached on
private val cacheLocs = new HashMap[Int, Array[Seq[TaskLocation]]]

private val dagSchedulerActorSupervisor =
  env.actorSystem.actorOf(Props(new DAGSchedulerActorSupervisor(this)))

// A closure serializer that we reuse.
// This is only safe because DAGScheduler runs in a single thread.
private val closureSerializer = SparkEnv.get.closureSerializer.newInstance()

private[scheduler] var eventProcessActor: ActorRef = _

private[scheduler] def handleJobSubmitted(jobId: Int,
    finalRDD: RDD[_],
    func: (TaskContext, Iterator[_]) => _,
    partitions: Array[Int],
    allowLocal: Boolean,
    callSite: CallSite,
    listener: JobListener,
    properties: Properties = null)
{

/** Submits stage, but first recursively submits any missing parents. */
private def submitStage(stage: Stage) {

/** Called when stage's parents are available and we can now do its task. */
private def submitMissingTasks(stage: Stage, jobId: Int) {


/** Finds the earliest-created active job that needs the stage */
// TODO: Probably should actually find among the active jobs that need this
// stage the one with the highest priority (highest-priority pool, earliest created).
// That should take care of at least part of the priority inversion problem with
// cross-job dependencies.
private def activeJobForStage(stage: Stage): Option[Int] = {
  val jobsThatUseStage: Array[Int] = stage.jobIds.toArray.sorted
  jobsThatUseStage.find(jobIdToActiveJob.contains)
}

/**
 * Types of events that can be handled by the DAGScheduler. The DAGScheduler uses an event queue
 * architecture where any thread can post an event (e.g. a task finishing or a new job being
 * submitted) but there is a single "logic" thread that reads these events and takes decisions.
 * This greatly simplifies synchronization.
 */
private[scheduler] sealed trait DAGSchedulerEvent

/**
 * Asynchronously passes SparkListenerEvents to registered SparkListeners.
 *
 * Until start() is called, all posted events are only buffered. Only after this listener bus
 * has started will events be actually propagated to all attached listeners. This listener bus
 * is stopped when it receives a SparkListenerShutdown event, which is posted using stop().
 */
private[spark] class LiveListenerBus extends SparkListenerBus with Logging {

/**
 * A SparkListenerEvent bus that relays events to its listeners
 */
private[spark] trait SparkListenerBus extends Logging {

  // SparkListeners attached to this event bus
  protected val sparkListeners = new ArrayBuffer[SparkListener]
    with mutable.SynchronizedBuffer[SparkListener]

  def addListener(listener: SparkListener) {
    sparkListeners += listener
  }

  /**
   * Post an event to all attached listeners.
   * This does nothing if the event is SparkListenerShutdown.
   */
  def postToAll(event: SparkListenerEvent) {

  /**
   * Apply the given function to all attached listeners, catching and logging any exception.
   */
  private def foreachListener(f: SparkListener => Unit): Unit = {
    sparkListeners.foreach { listener =>
      try {
        f(listener)
      } catch {
        case e: Exception =>
          logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)
      }
    }
  }

}

来自为知笔记(Wiz)

spark 笔记 7: DAGScheduler的更多相关文章

spark笔记环境配置
spark笔记 spark简介 saprk 有六个核心组件: SparkCore.SparkSQL.SparkStreaming.StructedStreaming.MLlib,Graphx Spar ...
spark 笔记 13: 再看DAGScheduler，stage状态更新流程
当某个task完成后,某个shuffle Stage X可能已完成,那么就可能会一些仅依赖Stage X的Stage现在可以执行了,所以要有响应task完成的状态更新流程. ============= ...
大数据学习——spark笔记
变量的定义 val a: Int = 1 var b = 2 方法和函数区别:函数可以作为参数传递给方法方法: def test(arg: Int): Int=>Int ={ 方法体 } v ...
spark 笔记 14: spark中的delay scheduling实现
延迟调度算法的实现是在TaskSetManager类中的,它通过将task存放在四个不同级别的hash表里,当有可用的资源时,resourceOffer函数的参数之一(maxLocality)就是这些 ...
spark 笔记 8: Stage
Stage 是一组独立的任务,他们在一个job中执行相同的功能(function),功能的划分是以shuffle为边界的.DAG调度器以拓扑顺序执行同一个Stage中的task. /** * A st ...
spark 笔记 9: Task/TaskContext
DAGScheduler最终创建了task set,并提交给了taskScheduler.那先得看看task是怎么定义和执行的. Task是execution执行的一个单元. Task: execut ...
spark 笔记 5: SparkContext，SparkConf
SparkContext 是spark的程序入口,相当于熟悉的'main'函数.它负责链接spark集群.创建RDD.创建累加计数器.创建广播变量. ) scheduler.initialize(ba ...
Spark笔记——技术点汇总
目录概况手工搭建集群引言安装Scala 配置文件启动与测试应用部署部署架构应用程序部署核心原理 RDD概念 RDD核心组成 RDD依赖关系 DAG图 RDD故障恢复机制 Standa ...
Spark分析之DAGScheduler
DAGScheduler概述:是一个面向Stage层面的调度器: 主要入参有: dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, ...

随机推荐

O022、如何使用 OpenStack CLI
参考https://www.cnblogs.com/CloudMan6/p/5402490.html 本节首先讨论如何删除image,然后介绍OpenStack CLI 的使用方法,最后讨论如何 ...
window.location.href 与 window.location.href 的区别
deep_learning_LSTM长短期记忆神经网络处理Mnist数据集
1.RNN(Recurrent Neural Network)循环神经网络模型详见RNN循环神经网络:https://www.cnblogs.com/pinard/p/6509630.html 2. ...
华为服务器XH628配置软RAID
1. 硬RAID 1.1. 配置准备本机型号为华为XH628,配有两块400GSSD,12块1.2TSAS盘.其中2块SSD做RAID1为系统盘,12块SAS盘做RAID5, ...
maven学习之路三
我们在写代码的时候,有些项目会有重复代码,或者是重复项目结构,这样我们就可以用maven 生成一个项目的基本骨架,就像我之前介绍的哪个logindemo一样继承了webApp-achetype一样.我 ...
Axure案例：用中继器实现便捷好用的3级菜单--转载
提示1:本篇教程可能不太适合新手,以及不了解中继器.全局变量.系统变量等使用的…新手提示2:文字其实不多,截图太多,所以看上去很长,也可直接翻到末尾查看所有的用例,其实并不多之前有介绍过使用中继器 ...
程序流程图、N-S图、PAD图
在需求分阶段经常使用3种方法去剖析我们所面对的业务. 程序流程图任何复杂的程序图都应由5种基本控制结构组成或嵌套而成. 盒图(N-S图) Nassi和Scheiderman提出了一种符合结构化程序设 ...
ES修改最大分页数
curl -XPUT http://localhost:9200/my_index/_settings?preserve_existing=true -H 'Content-Type: applica ...
linux下PHP扩展安装memcache模块
linux下PHP扩展安装memcache模块 roid 安装环境RHEL 4Php 5.2.6 所需软件libevent-1.4.6-stable.tar.gz (http://monkey.o ...
shell练习--PAT试题1010：一元多项式求导 (25 分)（失败案例喜加一）
---恢复内容开始--- 1010 一元多项式求导 (25 分) 设计函数求一元多项式的导数.(注:xn(n为整数)的一阶导数为nxn−1.) 输入格式: 以指数递降方式输入多项式非零项系 ...

spark 笔记 7: DAGScheduler

spark 笔记 7: DAGScheduler的更多相关文章

随机推荐

热门专题