SchedulerBackend, 两个任务, 申请资源和task执行和管理

对于SparkDeploySchedulerBackend, 基于actor模式, 主要就是启动和管理两个actor
Deploy.Client Actor, 负责资源申请, 在SparkDeploySchedulerBackend初始化的时候就会被创建, 然后Client会去到Master上注册, 最终完成在Worker上的ExecutorBackend的创建(参考, Spark源码分析 – Deploy), 并且这些ExecutorBackend都会被注册到Driver Actor上
Driver Actor, 负责task的执行
由于Spark是原先基于Mesos的, 然后为了兼容性才提供Standalone模式, 所以你可以看到Driver Actor中的接口都是mesos风格的, 在mesos的情况下应该是动态的申请资源, 然后执行task (猜测, 还没有看源码)
但对于coarse-grained Mesos mode和Spark's standalone deploy mode, 这步被简化成当TaskScheduler初始化的时候, 直接就将资源分配好了, 然后Driver Actor只是负责调度task在这些executor上执行
所以在makeOffers的注释上, 写的是Make fake resource offers, 因为这里其实没有真正的offer resources
关于Driver Actor如何调用task去执行, 关键在scheduler.resourceOffers

SchedulerBackend

package org.apache.spark.scheduler.cluster
/**
* A backend interface for cluster scheduling systems that allows plugging in different ones under
* ClusterScheduler. We assume a Mesos-like model where the application gets resource offers as
* machines become available and can launch tasks on them.
*/
private[spark] trait SchedulerBackend {
def start(): Unit
def stop(): Unit
def reviveOffers(): Unit
def defaultParallelism(): Int // Memory used by each executor (in megabytes)
protected val executorMemory: Int = SparkContext.executorMemoryRequested // TODO: Probably want to add a killTask too
}

 

StandaloneSchedulerBackend

用于coarse-grained Mesos mode和Spark's standalone deploy mode

可用看到主要目的, 就是创建并维护driverActor

主要的逻辑都在driverActor 中

/**
* A standalone scheduler backend, which waits for standalone executors to connect to it through
* Akka. These may be executed in a variety of ways, such as Mesos tasks for the coarse-grained
* Mesos mode or standalone processes for Spark's standalone deploy mode (spark.deploy.*).
*/
private[spark]
class StandaloneSchedulerBackend(scheduler: ClusterScheduler, actorSystem: ActorSystem)
extends SchedulerBackend with Logging
{
// Use an atomic variable to track total number of cores in the cluster for simplicity and speed
var totalCoreCount = new AtomicInteger(0) class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor {
  // ……后面分析
} var driverActor: ActorRef = null
val taskIdsOnSlave = new HashMap[String, HashSet[String]] override def start() {
val properties = new ArrayBuffer[(String, String)]
val iterator = System.getProperties.entrySet.iterator
while (iterator.hasNext) {
val entry = iterator.next
val (key, value) = (entry.getKey.toString, entry.getValue.toString)
if (key.startsWith("spark.") && !key.equals("spark.hostPort")) {
properties += ((key, value))
}
}
driverActor = actorSystem.actorOf( // 关键就是创建driverActor
Props(new DriverActor(properties)), name = StandaloneSchedulerBackend.ACTOR_NAME)
} private val timeout = Duration.create(System.getProperty("spark.akka.askTimeout", "10").toLong, "seconds") override def stop() {
try {
if (driverActor != null) {
val future = driverActor.ask(StopDriver)(timeout) // 关闭driverActor
Await.result(future, timeout)
}
} catch {
case e: Exception =>
throw new SparkException("Error stopping standalone scheduler's driver actor", e)
}
} override def reviveOffers() {
driverActor ! ReviveOffers // 发送ReviveOffers event给driverActor
} override def defaultParallelism() = Option(System.getProperty("spark.default.parallelism"))
.map(_.toInt).getOrElse(math.max(totalCoreCount.get(), 2)) // Called by subclasses when notified of a lost worker
def removeExecutor(executorId: String, reason: String) {
try {
val future = driverActor.ask(RemoveExecutor(executorId, reason))(timeout)
Await.result(future, timeout)
} catch {
case e: Exception =>
throw new SparkException("Error notifying standalone scheduler's driver actor", e)
}
}
}

DriverActor

关键的函数, makeOffers, 在executors上launch tasks, 什么时候调用?

RegisterExecutor的时候,

Task StatusUpdate的时候,

收到ReviveOffers event的时候, 新的task被submit的时候, delay scheduling被触发的时候(per second)

关于delay scheduling, 应该是为了保持活度, 当没有任何状态变化时, 仍然需要继续保持launch tasks

  class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor {
private val executorActor = new HashMap[String, ActorRef] // track所有executorActor Ref
private val executorAddress = new HashMap[String, Address]
private val executorHost = new HashMap[String, String]
private val freeCores = new HashMap[String, Int]
private val actorToExecutorId = new HashMap[ActorRef, String]
private val addressToExecutorId = new HashMap[Address, String] override def preStart() {
// Listen for remote client disconnection events, since they don't go through Akka's watch()
context.system.eventStream.subscribe(self, classOf[RemoteClientLifeCycleEvent]) // Periodically revive offers to allow delay scheduling to work
val reviveInterval = System.getProperty("spark.scheduler.revive.interval", "1000").toLong
context.system.scheduler.schedule(0.millis, reviveInterval.millis, self, ReviveOffers)
} def receive = {
case RegisterExecutor(executorId, hostPort, cores) => // 接收从StandaloneExecutorBackend发来的RegisterExecutor
Utils.checkHostPort(hostPort, "Host port expected " + hostPort)
if (executorActor.contains(executorId)) {
sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId)
} else {
logInfo("Registered executor: " + sender + " with ID " + executorId)
sender ! RegisteredExecutor(sparkProperties)
context.watch(sender) // watch executor actor
executorActor(executorId) = sender
executorHost(executorId) = Utils.parseHostPort(hostPort)._1
freeCores(executorId) = cores
executorAddress(executorId) = sender.path.address
actorToExecutorId(sender) = executorId
addressToExecutorId(sender.path.address) = executorId
totalCoreCount.addAndGet(cores)
makeOffers()
} case StatusUpdate(executorId, taskId, state, data) =>
scheduler.statusUpdate(taskId, state, data.value)
if (TaskState.isFinished(state)) {
freeCores(executorId) += 1
makeOffers(executorId)
} case ReviveOffers => // 接收从StandaloneSchedulerBackend发来的ReviveOffers
makeOffers() case StopDriver =>
sender ! true
context.stop(self) case RemoveExecutor(executorId, reason) =>
removeExecutor(executorId, reason)
sender ! true case Terminated(actor) =>
actorToExecutorId.get(actor).foreach(removeExecutor(_, "Akka actor terminated")) case RemoteClientDisconnected(transport, address) =>
addressToExecutorId.get(address).foreach(removeExecutor(_, "remote Akka client disconnected")) case RemoteClientShutdown(transport, address) =>
addressToExecutorId.get(address).foreach(removeExecutor(_, "remote Akka client shutdown"))
} // Make fake resource offers on all executors
def makeOffers() {
launchTasks(scheduler.resourceOffers(
executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))}))
} // Make fake resource offers on just one executor
// 可以看到这里传给scheduler.resourceOffers的WorkOffer,是根据之前已经分布好的executor静态生成的
// 而不是动态得到的workeroffer, 如果用mesos, 这里应该是动态获取workeroffer, 然后传给scheduler.resourceOffers
def makeOffers(executorId: String) {
launchTasks(scheduler.resourceOffers(
Seq(new WorkerOffer(executorId, executorHost(executorId), freeCores(executorId)))))
} // Launch tasks returned by a set of resource offers
def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
for (task <- tasks.flatten) {
freeCores(task.executorId) -= 1
executorActor(task.executorId) ! LaunchTask(task) // launch就是给executorActor发送LaunchTask event
}
} // Remove a disconnected slave from the cluster
def removeExecutor(executorId: String, reason: String) {
if (executorActor.contains(executorId)) {
logInfo("Executor " + executorId + " disconnected, so removing it")
val numCores = freeCores(executorId)
actorToExecutorId -= executorActor(executorId)
addressToExecutorId -= executorAddress(executorId)
executorActor -= executorId
executorHost -= executorId
freeCores -= executorId
totalCoreCount.addAndGet(-numCores)
scheduler.executorLost(executorId, SlaveLost(reason))
}
}
}

 

SparkDeploySchedulerBackend

关键就是创建和管理Driver and Client Actor

private[spark] class SparkDeploySchedulerBackend(
scheduler: ClusterScheduler,
sc: SparkContext,
master: String,
appName: String)
extends StandaloneSchedulerBackend(scheduler, sc.env.actorSystem)
with ClientListener
with Logging {
var client: Client = null
  override def start() {
super.start() // 调用StandaloneSchedulerBackend的start,创建DriverActor // The endpoint for executors to talk to us
val driverUrl = "akka://spark@%s:%s/user/%s".format(
System.getProperty("spark.driver.host"), System.getProperty("spark.driver.port"),
StandaloneSchedulerBackend.ACTOR_NAME)
val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}")
val command = Command( // 生成worker中ExecutorRunner中执行的command, 其实就是运行StandaloneExecutorBackend
"org.apache.spark.executor.StandaloneExecutorBackend", args, sc.executorEnvs)
val sparkHome = sc.getSparkHome().getOrElse(null)
val appDesc = new ApplicationDescription(appName, maxCores, executorMemory, command, sparkHome,
"http://" + sc.ui.appUIAddress) // 生成application description client = new Client(sc.env.actorSystem, master, appDesc, this) // 创建Client Actor, 并start
client.start()
} override def stop() {
stopping = true
super.stop()
client.stop()
if (shutdownCallback != null) {
shutdownCallback(this)
}
}
}

Spark源码分析 – SchedulerBackend的更多相关文章

  1. Spark源码分析 – 汇总索引

    http://jerryshao.me/categories.html#architecture-ref http://blog.csdn.net/pelick/article/details/172 ...

  2. Spark源码分析 -- TaskScheduler

    Spark在设计上将DAGScheduler和TaskScheduler完全解耦合, 所以在资源管理和task调度上可以有更多的方案 现在支持, LocalSheduler, ClusterSched ...

  3. Spark源码分析(三)-TaskScheduler创建

    原创文章,转载请注明: 转载自http://www.cnblogs.com/tovin/p/3879151.html 在SparkContext创建过程中会调用createTaskScheduler函 ...

  4. Spark源码分析:多种部署方式之间的区别与联系(转)

    原文链接:Spark源码分析:多种部署方式之间的区别与联系(1) 从官方的文档我们可以知道,Spark的部署方式有很多种:local.Standalone.Mesos.YARN.....不同部署方式的 ...

  5. Spark源码分析之七:Task运行(一)

    在Task调度相关的两篇文章<Spark源码分析之五:Task调度(一)>与<Spark源码分析之六:Task调度(二)>中,我们大致了解了Task调度相关的主要逻辑,并且在T ...

  6. Spark源码分析之五:Task调度(一)

    在前四篇博文中,我们分析了Job提交运行总流程的第一阶段Stage划分与提交,它又被细化为三个分阶段: 1.Job的调度模型与运行反馈: 2.Stage划分: 3.Stage提交:对应TaskSet的 ...

  7. spark 源码分析之四 -- TaskScheduler的创建和启动过程

    在 spark 源码分析之二 -- SparkContext 的初始化过程 中,第 14 步 和 16 步分别描述了 TaskScheduler的 初始化 和 启动过程. 话分两头,先说 TaskSc ...

  8. spark 源码分析之十九 -- Stage的提交

    引言 上篇 spark 源码分析之十九 -- DAG的生成和Stage的划分 中,主要介绍了下图中的前两个阶段DAG的构建和Stage的划分. 本篇文章主要剖析,Stage是如何提交的. rdd的依赖 ...

  9. spark 源码分析之二十一 -- Task的执行流程

    引言 在上两篇文章 spark 源码分析之十九 -- DAG的生成和Stage的划分 和 spark 源码分析之二十 -- Stage的提交 中剖析了Spark的DAG的生成,Stage的划分以及St ...

随机推荐

  1. VisualStudio“在查找预编译头使用时跳过"解决方案

    解决方案1:确保所有的cpp文件都包含了stdafx.h,且确保stdafx.h是第一个#include指令(经尝试,可行) 解决方案2:去掉预编译头 项目->属性->配置属性->c ...

  2. 跟着百度学PHP[9]-session会话

    参考:http://www.w3school.com.cn/php/php_sessions.asp session变量用于存储有关用户的会话的信息,或更改用户会话的设置,session变量保存的信息 ...

  3. HashMap原理<转>

    1. HashMap的数据结构 数据结构中有数组和链表来实现对数据的存储,但这两者基本上是两个极端. 数组 数组存储区间是连续的,占用内存严重,故空间复杂的很大.但数组的二分查找时间复杂度小,为O(1 ...

  4. c配置库ccl使用小结

    配置文件为key=value键值对形式 下载与安装 库文件下载:ccl-0.1.1.tar.gz 安装:  tar -zxvf ccl-0.1.1.tar.gz  cd ccl-0.1.1 ./con ...

  5. 专题实验 Toad 用户的创建与管理( 包括 role 等 )

    1. 用户登录数据库 是否可以通过操作系统权限来登录数据库, $ORACLE_HOME/network/admin/sqlnet.ora 这个文件中设置, 如果增加参数sqlnet.authentic ...

  6. 程序中判断android系统版本

    public static int getAndroidSDKVersion() { int version; try { version = Integer.valueOf(android.os.B ...

  7. 回文自动机 + DFS --- The 2014 ACM-ICPC Asia Xi’an Regional Contest Problem G.The Problem to Slow Down You

    The Problem to Slow Down You Problem's Link: http://acm.hust.edu.cn/vjudge/problem/viewProblem.actio ...

  8. SQL Server 2012附加数据库报错

    操作系统: win8 数据库:SQL 2012 遇到问题: 以管理员身份登录SQL 2012,附件数据库提示如下错误: 解决办法: 以windows账号登录,附加,成功!

  9. dedecms中如何去掉文章页面的广告

    在arcticle_arcticle.htm页面找到广告调用代码{dede:myad name='myad'/}全部去掉就好了,如果要换成自己的广告,就换广告位标识 myad 就可以了

  10. 因此mybatis最好与spring集成起来使用

    单独使用mybatis是有很多限制的(比如无法实现跨越多个session的事务),而且很多业务系统本来就是使用spring来管理的事务,因此mybatis最好与spring集成起来使用. spring ...