Spark技术内幕之任务调度:从SparkContext开始
SparkContext是开发Spark应用的入口,它负责和整个集群的交互,包括创建RDD,accumulators and broadcast variables。理解Spark的架构,需要从这个入口开始。下图是官网的架构图。
DriverProgram就是用户提交的程序,这里边定义了SparkContext的实例。SparkContext定义在core/src/main/scala/org/apache/spark/SparkContext.scala。
Spark默认的构造函数接受org.apache.spark.SparkConf, 通过这个参数我们可以自定义本次提交的参数,这个参数会覆盖系统的默认配置。
先上一张与SparkContext相关的类图:
下面是SparkContext非常重要的数据成员的定义:
// Create and start the scheduler
private[spark] var taskScheduler = SparkContext.createTaskScheduler(this, master)
private val heartbeatReceiver = env.actorSystem.actorOf(
Props(new HeartbeatReceiver(taskScheduler)), "HeartbeatReceiver")
@volatile private[spark] var dagScheduler: DAGScheduler = _
try {
dagScheduler = new DAGScheduler(this)
} catch {
case e: Exception => throw
new SparkException("DAGScheduler cannot be initialized due to %s".format(e.getMessage))
} // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
// constructor
taskScheduler.start()
通过createTaskScheduler,我们可以获得不同资源管理类型或者部署类型的调度器。看一下现在支持的部署方法:
/** Creates a task scheduler based on a given master URL. Extracted for testing. */
private def createTaskScheduler(sc: SparkContext, master: String): TaskScheduler = {
// Regular expression used for local[N] and local[*] master formats
val LOCAL_N_REGEX = """local\[([0-9]+|\*)\]""".r
// Regular expression for local[N, maxRetries], used in tests with failing tasks
val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+|\*)\s*,\s*([0-9]+)\]""".r
// Regular expression for simulating a Spark cluster of [N, cores, memory] locally
val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
// Regular expression for connecting to Spark deploy clusters
val SPARK_REGEX = """spark://(.*)""".r
// Regular expression for connection to Mesos cluster by mesos:// or zk:// url
val MESOS_REGEX = """(mesos|zk)://.*""".r
// Regular expression for connection to Simr cluster
val SIMR_REGEX = """simr://(.*)""".r // When running locally, don't try to re-execute tasks on failure.
val MAX_LOCAL_TASK_FAILURES = 1 master match {
case "local" =>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(scheduler, 1)
scheduler.initialize(backend)
scheduler case LOCAL_N_REGEX(threads) =>
def localCpuCount = Runtime.getRuntime.availableProcessors()
// local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
val threadCount = if (threads == "*") localCpuCount else threads.toInt
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(scheduler, threadCount)
scheduler.initialize(backend)
scheduler case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
def localCpuCount = Runtime.getRuntime.availableProcessors()
// local[*, M] means the number of cores on the computer with M failures
// local[N, M] means exactly N threads with M failures
val threadCount = if (threads == "*") localCpuCount else threads.toInt
val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
val backend = new LocalBackend(scheduler, threadCount)
scheduler.initialize(backend)
scheduler case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
scheduler case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
// Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
val memoryPerSlaveInt = memoryPerSlave.toInt
if (sc.executorMemory > memoryPerSlaveInt) {
throw new SparkException(
"Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
memoryPerSlaveInt, sc.executorMemory))
} val scheduler = new TaskSchedulerImpl(sc)
val localCluster = new LocalSparkCluster(
numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt)
val masterUrls = localCluster.start()
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
backend.shutdownCallback = (backend: SparkDeploySchedulerBackend) => {
localCluster.stop()
}
scheduler case "yarn-standalone" | "yarn-cluster" =>
if (master == "yarn-standalone") {
logWarning(
"\"yarn-standalone\" is deprecated as of Spark 1.0. Use \"yarn-cluster\" instead.")
}
val scheduler = try {
val clazz = Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
val cons = clazz.getConstructor(classOf[SparkContext])
cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
} catch {
// TODO: Enumerate the exact reasons why it can fail
// But irrespective of it, it means we cannot proceed !
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
val backend = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
scheduler.initialize(backend)
scheduler case "yarn-client" =>
val scheduler = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnClientClusterScheduler")
val cons = clazz.getConstructor(classOf[SparkContext])
cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl] } catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
} val backend = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
} scheduler.initialize(backend)
scheduler case mesosUrl @ MESOS_REGEX(_) =>
MesosNativeLibrary.load()
val scheduler = new TaskSchedulerImpl(sc)
val coarseGrained = sc.conf.getBoolean("spark.mesos.coarse", false)
val url = mesosUrl.stripPrefix("mesos://") // strip scheme from raw Mesos URLs
val backend = if (coarseGrained) {
new CoarseMesosSchedulerBackend(scheduler, sc, url)
} else {
new MesosSchedulerBackend(scheduler, sc, url)
}
scheduler.initialize(backend)
scheduler case SIMR_REGEX(simrUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val backend = new SimrSchedulerBackend(scheduler, sc, simrUrl)
scheduler.initialize(backend)
scheduler case _ =>
throw new SparkException("Could not parse Master URL: '" + master + "'")
}
}
}
主要的逻辑从line 20开始。主要通过传入的Master URL来生成Scheduler 和 Scheduler backend。对于常见的Standalone的部署方式,我们看一下是生成的Scheduler 和 Scheduler backend:
case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
scheduler
org.apache.spark.scheduler.TaskSchedulerImpl通过一个SchedulerBackend管理了所有的cluster的调度;它主要实现了通用的逻辑。对于系统刚启动时,需要理解两个接口,一个是initialize,一个是start。这个也是在SparkContext初始化时调用的:
def initialize(backend: SchedulerBackend) {
this.backend = backend
// temporarily set rootPool name to empty
rootPool = new Pool("", schedulingMode, 0, 0)
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
}
}
schedulableBuilder.buildPools()
}
由此可见,初始化主要是SchedulerBackend的初始化,它主要时通过集群的配置来获得调度模式,现在支持的调度模式是FIFO和公平调度,默认的是FIFO。
// default scheduler is FIFO
private val schedulingModeConf = conf.get("spark.scheduler.mode", "FIFO")
val schedulingMode: SchedulingMode = try {
SchedulingMode.withName(schedulingModeConf.toUpperCase)
} catch {
case e: java.util.NoSuchElementException =>
throw new SparkException(s"Unrecognized spark.scheduler.mode: $schedulingModeConf")
}
start的实现如下:
override def start() {
backend.start()
if (!isLocal && conf.getBoolean("spark.speculation", false)) {
logInfo("Starting speculative execution thread")
import sc.env.actorSystem.dispatcher
sc.env.actorSystem.scheduler.schedule(SPECULATION_INTERVAL milliseconds,
SPECULATION_INTERVAL milliseconds) {
Utils.tryOrExit { checkSpeculatableTasks() }
}
}
}
主要是backend的启动。对于非本地模式,并且设置了spark.speculation为true,那么对于指定时间未返回的task将会启动另外的task来执行。其实对于一般的应用,这个的确可能会减少任务的执行时间,但是也浪费了集群的计算资源。因此对于离线应用来说,这个设置是不推荐的。
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend是Standalone模式的SchedulerBackend。它的定义如下:
private[spark] class SparkDeploySchedulerBackend(
scheduler: TaskSchedulerImpl,
sc: SparkContext,
masters: Array[String])
extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem)
with AppClientListener
with Logging {
看一下它的start:
override def start() {
super.start()
// The endpoint for executors to talk to us
val driverUrl = "akka.tcp://%s@%s:%s/user/%s".format(
SparkEnv.driverActorSystemName,
conf.get("spark.driver.host"),
conf.get("spark.driver.port"),
CoarseGrainedSchedulerBackend.ACTOR_NAME)
val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
.map(Utils.splitCommandString).getOrElse(Seq.empty)
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath").toSeq.flatMap { cp =>
cp.split(java.io.File.pathSeparator)
}
val libraryPathEntries =
sc.conf.getOption("spark.executor.extraLibraryPath").toSeq.flatMap { cp =>
cp.split(java.io.File.pathSeparator)
}
// Start executors with a few necessary configs for registering with the scheduler
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries, libraryPathEntries, javaOpts)
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
sc.ui.appUIAddress, sc.eventLogger.map(_.logDir))
client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
client.start()
waitForRegistration()
}
接下来,我们将对TaskScheduler,SchedulerBackend和DAG Scheduler进行详解,来逐步揭开他们的神秘面纱。
Spark技术内幕之任务调度:从SparkContext开始的更多相关文章
- Spark技术内幕:Stage划分及提交源码分析
http://blog.csdn.net/anzhsoft/article/details/39859463 当触发一个RDD的action后,以count为例,调用关系如下: org.apache. ...
- Spark技术内幕: Task向Executor提交的源码解析
在上文<Spark技术内幕:Stage划分及提交源码分析>中,我们分析了Stage的生成和提交.但是Stage的提交,只是DAGScheduler完成了对DAG的划分,生成了一个计算拓扑, ...
- Spark技术内幕: Task向Executor提交的源代码解析
在上文<Spark技术内幕:Stage划分及提交源代码分析>中,我们分析了Stage的生成和提交.可是Stage的提交,仅仅是DAGScheduler完毕了对DAG的划分,生成了一个计算拓 ...
- Spark技术内幕:Master的故障恢复
Spark技术内幕:Master基于ZooKeeper的High Availability(HA)源码实现 详细阐述了使用ZK实现的Master的HA,那么Master是如何快速故障恢复的呢? 处于 ...
- Spark技术内幕:Executor分配详解
当用户应用new SparkContext后,集群就会为在Worker上分配executor,那么这个过程是什么呢?本文以Standalone的Cluster为例,详细的阐述这个过程.序列图如下: 1 ...
- Spark技术内幕:Client,Master和Worker 通信源码解析
http://blog.csdn.net/anzhsoft/article/details/30802603 Spark的Cluster Manager可以有几种部署模式: Standlone Mes ...
- 我的第一本著作:Spark技术内幕上市!
现在各大网站销售中! 京东:http://item.jd.com/11770787.html 当当:http://product.dangdang.com/23776595.html 亚马逊:http ...
- Spark技术内幕:Client,Master和Worker 通信源代码解析
Spark的Cluster Manager能够有几种部署模式: Standlone Mesos YARN EC2 Local 在向集群提交计算任务后,系统的运算模型就是Driver Program定义 ...
- Spark技术内幕:Stage划分及提交源代码分析
当触发一个RDD的action后.以count为例,调用关系例如以下: org.apache.spark.rdd.RDD#count org.apache.spark.SparkContext#run ...
随机推荐
- 为了异常安全(swap,share_ptr)——Effecive C++
互斥锁: 假设我们要在多线程中实现背景图片的控制: class PrettyMenu{ public: -- void changeBackground(std::istream& imgSr ...
- bzoj 3745: [Coci2015]Norma
Description Solution 考虑分治: 我们要统计跨越 \(mid\) 的区间的贡献 分最大值和最小值所在位置进行讨论: 设左边枚举到了 \(i\),左边 \([i,mid]\) 的最大 ...
- 【网络流】【BZOJ1070】【SCOI2007】修车
原题链接:http://www.lydsy.com/JudgeOnline/problem.php?id=1070 题意:问你如何分配老司机使得每部车的等待时间之和最短. 解题思路:本题不易正做,考虑 ...
- 2015 多校联赛 ——HDU5416(异或)
CRB has a tree, whose vertices are labeled by 1, 2, …, N. They are connected by N – 1 edges. Each ed ...
- ●BZOJ 4821 [Sdoi2017]相关分析
题链: http://www.lydsy.com/JudgeOnline/problem.php?id=4821 题解: 线段树是真的恶心,(也许是我的方法麻烦了一些吧)首先那个式子可以做如下化简: ...
- [bzoj4755][Jsoi2016]扭动的回文串
来自FallDream的博客,未经允许,请勿转载,谢谢. JYY有两个长度均为N的字符串A和B. 一个“扭动字符串S(i,j,k)由A中的第i个字符到第j个字符组成的子串与B中的第j个字符到第k个字符 ...
- Miller-Rabin,Pollard-Rho(BZOJ3667)
裸题直接做就好了. #include <cstdio> #include <cstdlib> #include <algorithm> using namespac ...
- django rest-framework 2.请求和响应
一.请求对象 REST 框架引入Request来扩展常规的HttpRequest,并提供了更灵活的请求解析.Request对象的核心功能是request.data属性. 导入方式: from rest ...
- xml 和数组的相互转化
数组转化为xml: function arrtoxml($arr,$dom=0,$item=0){ if (!$dom){ $dom = new DOMDocument("1.0" ...
- Unix系统使用的地址索引结构有什么特点?
使用地址索引表.索引节点中地址索引表标识出含有文件数据的磁盘块的分布情况.