spark 笔记 14: spark中的delay scheduling实现

延迟调度算法的实现是在TaskSetManager类中的，它通过将task存放在四个不同级别的hash表里，当有可用的资源时，resourceOffer函数的参数之一（maxLocality）就是这些资源的最大（或者最优）locality级别，如果存在task满足资源的locality，那从最优级别的hash表。也就是task和excutor都有loclity级别，如果能找到匹配的task，那从匹配的task中找一个最优的task。

=====================延迟调度算法=============================

->TaskSetManager::resourceOffer(execId: String, host: String,maxLocality: TaskLocality.TaskLocality): Option[TaskDescription]

->if (maxLocality != TaskLocality.NO_PREF) --如果资源是有locality特征的

->allowedLocality = getAllowedLocalityLevel(curTime) --获取当前taskSet允许执行的locality。getAllowedLocalityLevel随时间而变化

->if (allowedLocality > maxLocality) --如果资源的locality级别高于taskSet允许的级别

->allowedLocality = maxLocality --那么提升taskSet的级别

->task = findTask(execId, host, allowedLocality) --根据允许的locality级别去找一个满足要求的task

->从最优的locality级别(process_local)开始找，返回一个满足locolity的task（为最优级别）

->task match case Some((index, taskLocality, speculative)) --找到了一个task

-> val info = new TaskInfo(taskId, index, attemptNum, curTime, execId, host, taskLocality, speculative)

->if (maxLocality != TaskLocality.NO_PREF) // NO_PREF will not affect the variables related to delay scheduling

->currentLocalityIndex = getLocalityIndex(taskLocality) // Update our locality level for delay scheduling

->lastLaunchTime = curTime --更新最近执行task的时间，计算当前locality时需要

->addRunningTask(taskId) --加入执行task中

->logInfo("Starting %s (TID %d, %s, %s, %d bytes)"

->sched.dagScheduler.taskStarted(task, info) --通知调度器有task开始运行

->eventProcessActor ! BeginEvent(task, taskInfo)

->return Some(new TaskDescription(taskId, execId, taskName, index, serializedTask)) --返回task

->case _ => return None --没有满足locality要求的task，返回None

=====================end==================================

myLocalityLevels ：记录当前所有有效的locality级别

localityWaits ：记录不同locality级别的等待时间

currentLocalityIndex ：当前的locality级别，随着等待时间而不断变化

pendingTasksForExecutor: PROCESS_LOCAL进程级别的task

pendingTasksForHost ：NODE_LOCAL主机界别的task

pendingTasksForRack ：机架级别的task

pendingTasksWithNoPrefs ：没有locality要求的task

// Figure out which locality levels we have in our TaskSet, so we can do delay scheduling
var myLocalityLevels = computeValidLocalityLevels()
var localityWaits = myLocalityLevels.map(getLocalityWait) // Time to wait at each level

// Delay scheduling variables: we keep track of our current locality level and the time we
// last launched a task at that level, and move up a level when localityWaits[curLevel] expires.
// We then move down if we manage to launch a "more local" task.
var currentLocalityIndex = 0    // Index of our current locality level in validLocalityLevels

// Set of pending tasks for each executor. These collections are actually
// treated as stacks, in which new tasks are added to the end of the
// ArrayBuffer and removed from the end. This makes it faster to detect
// tasks that repeatedly fail because whenever a task failed, it is put
// back at the head of the stack. They are also only cleaned up lazily;
// when a task is launched, it remains in all the pending lists except
// the one that it was launched from, but gets removed from them later.
private val pendingTasksForExecutor = new HashMap[String, ArrayBuffer[Int]]

// Set of pending tasks for each host. Similar to pendingTasksForExecutor,
// but at host level.
private val pendingTasksForHost = new HashMap[String, ArrayBuffer[Int]]

// Set of pending tasks for each rack -- similar to the above.
private val pendingTasksForRack = new HashMap[String, ArrayBuffer[Int]]

// Set containing pending tasks with no locality preferences.
var pendingTasksWithNoPrefs = new ArrayBuffer[Int]

计算当前调度器中有效的locality级别

var lastLaunchTime = clock.getTime()  // Time we last launched a task at this level/**
 * Compute the locality levels used in this TaskSet. Assumes that all tasks have already been
 * added to queues using addPendingTask.
 *
 */
private def computeValidLocalityLevels(): Array[TaskLocality.TaskLocality] = {
  import TaskLocality.{PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY}
  val levels = new ArrayBuffer[TaskLocality.TaskLocality]
  if (!pendingTasksForExecutor.isEmpty && getLocalityWait(PROCESS_LOCAL) != 0 &&
      pendingTasksForExecutor.keySet.exists(sched.isExecutorAlive(_))) {
    levels += PROCESS_LOCAL
  }
  if (!pendingTasksForHost.isEmpty && getLocalityWait(NODE_LOCAL) != 0 &&
      pendingTasksForHost.keySet.exists(sched.hasExecutorsAliveOnHost(_))) {
    levels += NODE_LOCAL
  }
  if (!pendingTasksWithNoPrefs.isEmpty) {
    levels += NO_PREF
  }
  if (!pendingTasksForRack.isEmpty && getLocalityWait(RACK_LOCAL) != 0 &&
      pendingTasksForRack.keySet.exists(sched.hasHostAliveOnRack(_))) {
    levels += RACK_LOCAL
  }
  levels += ANY
  logDebug("Valid locality levels for " + taskSet + ": " + levels.mkString(", "))
  levels.toArray
}

获取每个locality级别的等待时间

private def getLocalityWait(level: TaskLocality.TaskLocality): Long = {
  val defaultWait = conf.get("spark.locality.wait", "3000")
  level match {
    case TaskLocality.PROCESS_LOCAL =>
      conf.get("spark.locality.wait.process", defaultWait).toLong
    case TaskLocality.NODE_LOCAL =>
      conf.get("spark.locality.wait.node", defaultWait).toLong
    case TaskLocality.RACK_LOCAL =>
      conf.get("spark.locality.wait.rack", defaultWait).toLong
    case _ => 0L
  }
}

locality的级别定义

@DeveloperApi
object TaskLocality extends Enumeration {
  // Process local is expected to be used ONLY within TaskSetManager for now.
  val PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY = Value

  type TaskLocality = Value

  def isAllowed(constraint: TaskLocality, condition: TaskLocality): Boolean = {
    condition <= constraint
  }
}

根据输入的locality级别，获取一个在本taskSet有效的locality级别。因为当前taskSet可能有一些级别没有task。向低优先级的靠拢的原则。

/**
 * Find the index in myLocalityLevels for a given locality. This is also designed to work with
 * localities that are not in myLocalityLevels (in case we somehow get those) by returning the
 * next-biggest level we have. Uses the fact that the last value in myLocalityLevels is ANY.
 */
def getLocalityIndex(locality: TaskLocality.TaskLocality): Int = {
  var index = 0
  while (locality > myLocalityLevels(index)) {
    index += 1
  }
  index
}

获取当前允许的locality级别。它通过已经等待的时间和需要等待的时间做比较得到当前处于什么样的locality级别中。

/**
 * Get the level we can launch tasks according to delay scheduling, based on current wait time.
 */
private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = {
  while (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex) &&
      currentLocalityIndex < myLocalityLevels.length - 1)
  {
    // Jump to the next locality level, and remove our waiting time for the current one since
    // we don't want to count it again on the next one
    lastLaunchTime += localityWaits(currentLocalityIndex)
    currentLocalityIndex += 1
  }
  myLocalityLevels(currentLocalityIndex)
}

当一个task得到执行后，重新初始化locality级别

def recomputeLocality() {
  val previousLocalityLevel = myLocalityLevels(currentLocalityIndex)
  myLocalityLevels = computeValidLocalityLevels()
  localityWaits = myLocalityLevels.map(getLocalityWait)
  currentLocalityIndex = getLocalityIndex(previousLocalityLevel)
}

获取本taskSet有效的locality级别

/**
 * Compute the locality levels used in this TaskSet. Assumes that all tasks have already been
 * added to queues using addPendingTask.
 *
 */
private def computeValidLocalityLevels(): Array[TaskLocality.TaskLocality] = {
  import TaskLocality.{PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY}
  val levels = new ArrayBuffer[TaskLocality.TaskLocality]
  if (!pendingTasksForExecutor.isEmpty && getLocalityWait(PROCESS_LOCAL) != 0 &&
      pendingTasksForExecutor.keySet.exists(sched.isExecutorAlive(_))) {
    levels += PROCESS_LOCAL
  }
  if (!pendingTasksForHost.isEmpty && getLocalityWait(NODE_LOCAL) != 0 &&
      pendingTasksForHost.keySet.exists(sched.hasExecutorsAliveOnHost(_))) {
    levels += NODE_LOCAL
  }
  if (!pendingTasksWithNoPrefs.isEmpty) {
    levels += NO_PREF
  }
  if (!pendingTasksForRack.isEmpty && getLocalityWait(RACK_LOCAL) != 0 &&
      pendingTasksForRack.keySet.exists(sched.hasHostAliveOnRack(_))) {
    levels += RACK_LOCAL
  }
  levels += ANY
  logDebug("Valid locality levels for " + taskSet + ": " + levels.mkString(", "))
  levels.toArray
}

查找一个可符合locality要求的task。从最优的locality开始找。所以最优的locality总是优先被执行。

/**
 * Dequeue a pending task for a given node and return its index and locality level.
 * Only search for tasks matching the given locality constraint.
 *
 * @return An option containing (task index within the task set, locality, is speculative?)
 */
private def findTask(execId: String, host: String, maxLocality: TaskLocality.Value)
  : Option[(Int, TaskLocality.Value, Boolean)] =
{
  for (index <- findTaskFromList(execId, getPendingTasksForExecutor(execId))) {
    return Some((index, TaskLocality.PROCESS_LOCAL, false))
  }
。。。
  // find a speculative task if all others tasks have been scheduled
  findSpeculativeTask(execId, host, maxLocality).map {
    case (taskIndex, allowedLocality) => (taskIndex, allowedLocality, true)}
}

来自为知笔记(Wiz)

spark 笔记 14: spark中的delay scheduling实现的更多相关文章

spark 笔记 3：Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling
spark论文中说他使用了延迟调度算法,源于这篇论文:http://people.csail.mit.edu/matei/papers/2010/eurosys_delay_scheduling.pd ...
spark 笔记 2： Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf ucb关于spark的论文,对spark中核心组件RDD最原始.本质的理解, ...
Apache Spark 2.2.0 中文文档 - Spark RDD（Resilient Distributed Datasets）论文 | ApacheCN
Spark RDD(Resilient Distributed Datasets)论文概要 1: 介绍 2: Resilient Distributed Datasets(RDDs) 2.1 RDD ...
spark学习笔记总结-spark入门资料精化
Spark学习笔记 Spark简介 spark 可以很容易和yarn结合,直接调用HDFS.Hbase上面的数据,和hadoop结合.配置很容易. spark发展迅猛,框架比hadoop更加灵活实用. ...
Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南概述一个入门示例基础概念依赖初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...
Apache Spark 2.2.0 中文文档
Apache Spark 2.2.0 中文文档 - 快速入门 | ApacheCN Geekhoo 关注 2017.09.20 13:55* 字数 2062 阅读 13评论 0喜欢 1 快速入门使用 ...
spark 笔记 7: DAGScheduler
在前面的sparkContex和RDD都可以看到,真正的计算工作都是同过调用DAGScheduler的runjob方法来实现的.这是一个很重要的类.在看这个类实现之前,需要对actor模式有一点了解: ...
Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南
Spark Streaming 编程指南概述一个入门示例基础概念依赖初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...
二、spark入门之spark shell：文本中发现5个最常用的word
scala> val textFile = sc.textFile("/Users/admin/spark-1.5.1-bin-hadoop2.4/README.md") s ...

随机推荐

一次MySQL两千万数据大表的优化过程，三种解决方案
问题概述使用阿里云rds for MySQL数据库(就是MySQL5.6版本),有个用户上网记录表6个月的数据量近2000万,保留最近一年的数据量达到4000万,查询速度极慢,日常卡死.严重影响业务 ...
Hyperledger Fabric（4）链码ChainCode
智能合约,是一个抽象的概念,智能合约的历史可以追溯到 1990s 年代.它是由尼克萨博(Nick Szabo)提出的理念,几乎与互联网同龄. 我们这里所说的智能合约只狭义的指区块链中.它能够部署和运行 ...
在java中使用solr7.2.0 新旧版本创建SolrClient对比
在Java中使用solr 版本7.2.0 solrj已经更新到了7.2.0,新版本solr获取SolrClient的方式也和之前旧版本有所不同 solr6.5开始不推荐直接使用HttpSolrClie ...
Delphi BitBtn组件
Java入门指南-01 基本概要说明
一.Java语言概述 Java是一门面向对象编程语言.编程,即编写程序.程序对于我们来说,应该是有所了解的.只是有可能你们不知道而已.比如,我们电脑上的 QQ.谷歌浏览器等,都叫做应用程序. 二.本系 ...
白盒测试之JUnit与SpringTest的完美结合
通过白盒的单元测试可以验证程序基本功能的有效性,从而保证整个系统的质量,功在一时,利在千秋.目前80%以上公司后台还是基于java,尤其是后台大量采用Spring框架,我们这里采用Junit和Spri ...
TDD明白了，ATDD测试到底是什么？
随着敏捷开发的蓬勃发展.遍地开花,TDD(Test Drive Development测试驱动开发)的概念已经深入软件研发从业者的心中. TDD讲究的是:“测试在先.编码在后”.有别于以往的“先编码. ...
Hive Serde（四）
Hive Serde 目的: Hive Serde用来做序列化和反序列化,构建在数据存储和执行引擎之间,对两者实现解耦. 应用场景: 1.hive主要用来存储结构化数据,如果结构化数据存储的格 ...
如何在 Ubuntu 上安装 pip
1.为 Python 2 安装 pip 首先,确保已经安装了 Python 2. 在 Ubuntu 上,可以使用以下命令进行验证 python2 --version 如果没有错误并且显示了 Pytho ...
牛客第八场 C-counting paths 树形dp计数
题目地址题意给你一颗树初始点颜色全部为白色对于每一个满足要求一的点集s f(s)的定义为先把点集内的点染黑满足要求二的路径集合数量要求一为两两黑点之间不能出现白色的点要求二为将这个路径集 ...

spark 笔记 14: spark中的delay scheduling实现

spark 笔记 14: spark中的delay scheduling实现的更多相关文章

随机推荐

热门专题