spark1.1.0源码阅读-executor
1. executor上执行launchTask
def launchTask(
context: ExecutorBackend, taskId: Long, taskName: String, serializedTask: ByteBuffer) {
val tr = new TaskRunner(context, taskId, taskName, serializedTask)
runningTasks.put(taskId, tr)
threadPool.execute(tr)
}
2. executor上执行TaskRunner的run
class TaskRunner(
execBackend: ExecutorBackend, val taskId: Long, taskName: String, serializedTask: ByteBuffer)
extends Runnable { @volatile private var killed = false
@volatile var task: Task[Any] = _
@volatile var attemptedTask: Option[Task[Any]] = None def kill(interruptThread: Boolean) {
logInfo(s"Executor is trying to kill $taskName (TID $taskId)")
killed = true
if (task != null) {
task.kill(interruptThread)
}
} override def run() {
val startTime = System.currentTimeMillis()
SparkEnv.set(env)
Thread.currentThread.setContextClassLoader(replClassLoader)
val ser = SparkEnv.get.closureSerializer.newInstance()
logInfo(s"Running $taskName (TID $taskId)")
execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
var taskStart: Long = 0
def gcTime = ManagementFactory.getGarbageCollectorMXBeans.map(_.getCollectionTime).sum
val startGCTime = gcTime try {
SparkEnv.set(env)
Accumulators.clear()
val (taskFiles, taskJars, taskBytes) = Task.deserializeWithDependencies(serializedTask) //反序列化出 taskFiles,taskJars,taskBytes
updateDependencies(taskFiles, taskJars)
task = ser.deserialize[Task[Any]](taskBytes, Thread.currentThread.getContextClassLoader) //反序列化出task对象 // If this task has been killed before we deserialized it, let's quit now. Otherwise,
// continue executing the task.
if (killed) {
// Throw an exception rather than returning, because returning within a try{} block
// causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
// exception will be caught by the catch block, leading to an incorrect ExceptionFailure
// for the task.
throw new TaskKilledException
} attemptedTask = Some(task)
logDebug("Task " + taskId + "'s epoch is " + task.epoch)
env.mapOutputTracker.updateEpoch(task.epoch) 49 // Run the actual task and measure its runtime.
50 taskStart = System.currentTimeMillis()
51 val value = task.run(taskId.toInt)
52 val taskFinish = System.currentTimeMillis() // If the task has been killed, let's fail it.
if (task.killed) {
throw new TaskKilledException
}
3. task.run
private[spark] abstract class Task[T](val stageId: Int, var partitionId: Int) extends Serializable {
final def run(attemptId: Long): T = {
context = new TaskContext(stageId, partitionId, attemptId, runningLocally = false)
context.taskMetrics.hostname = Utils.localHostName()
taskThread = Thread.currentThread()
if (_killed) {
kill(interruptThread = false)
}
10 runTask(context)
}
4. task是抽象类,对于具体的类(resultTask和shuffleMapTask)会执行相应的runTask。
a. resultTask
override def runTask(context: TaskContext): U = {
2 // Deserialize the RDD and the func using the broadcast variables.
val ser = SparkEnv.get.closureSerializer.newInstance()
val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
metrics = Some(context.taskMetrics)
try {
func(context, rdd.iterator(partition, context))
} finally {
context.markTaskCompleted()
}
}
b. shuffleMapTask
override def runTask(context: TaskContext): MapStatus = {
// Deserialize the RDD using the broadcast variable.
val ser = SparkEnv.get.closureSerializer.newInstance()
val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
metrics = Some(context.taskMetrics)
var writer: ShuffleWriter[Any, Any] = null
try {
val manager = SparkEnv.get.shuffleManager
writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
12 writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
return writer.stop(success = true).get
} catch {
case e: Exception =>
if (writer != null) {
writer.stop(success = false)
}
throw e
} finally {
context.markTaskCompleted()
}
}
/** Write a bunch of records to this task's output */
override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {
val iter = if (dep.aggregator.isDefined) {
if (dep.mapSideCombine) {
dep.aggregator.get.combineValuesByKey(records, context)
} else {
records
}
} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
throw new IllegalStateException("Aggregator is empty for map-side combine")
} else {
records
} for (elem <- iter) {
val bucketId = dep.partitioner.getPartition(elem._1)
shuffle.writers(bucketId).write(elem)
}
}
1 /**
2 * Get a ShuffleWriterGroup for the given map task, which will register it as complete
3 * when the writers are closed successfully
4 */
def forMapTask(shuffleId: Int, mapId: Int, numBuckets: Int, serializer: Serializer,
writeMetrics: ShuffleWriteMetrics) = {
new ShuffleWriterGroup {
shuffleStates.putIfAbsent(shuffleId, new ShuffleState(numBuckets))
private val shuffleState = shuffleStates(shuffleId)
private var fileGroup: ShuffleFileGroup = null val writers: Array[BlockObjectWriter] = if (consolidateShuffleFiles) {
fileGroup = getUnusedFileGroup()
Array.tabulate[BlockObjectWriter](numBuckets) { bucketId =>
val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)
blockManager.getDiskWriter(blockId, fileGroup(bucketId), serializer, bufferSize,
writeMetrics)
}
} else {
Array.tabulate[BlockObjectWriter](numBuckets) { bucketId =>
val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)
val blockFile = blockManager.diskBlockManager.getFile(blockId)
// Because of previous failures, the shuffle file may already exist on this machine.
// If so, remove it.
if (blockFile.exists) {
if (blockFile.delete()) {
logInfo(s"Removed existing shuffle file $blockFile")
} else {
logWarning(s"Failed to remove existing shuffle file $blockFile")
}
}
blockManager.getDiskWriter(blockId, blockFile, serializer, bufferSize, writeMetrics)
}
}
spark1.1.0源码阅读-executor的更多相关文章
- spark1.1.0源码阅读-taskScheduler
1. sparkContext中设置createTaskScheduler case "yarn-standalone" | "yarn-cluster" =& ...
- spark1.1.0源码阅读-dagscheduler and stage
1. rdd action ->sparkContext.runJob->dagscheduler.runJob def runJob[T, U: ClassTag]( rdd: RDD[ ...
- Yii2.0源码阅读-一次请求的完整过程
Yii2.0框架源码阅读,从请求发起,到结束的运行步骤 其实最初阅读是从yii\web\UrlManager这个类开始看起,不断的寻找这个类中方法的调用者,最终回到了yii\web\Applicati ...
- Vue2.0源码阅读笔记(四):nextTick
在阅读 nextTick 的源码之前,要先弄明白 JS 执行环境运行机制,介绍 JS 执行环境的事件循环机制的文章很多,大部分都阐述的比较笼统,甚至有些文章说的是错误的,以下为个人理解,如有错误, ...
- Vue2.0源码阅读笔记--生命周期
一.Vue2.0的生命周期 Vue2.0的整个生命周期有八个:分别是 1.beforeCreate,2.created,3.beforeMount,4.mounted,5.beforeUpdate,6 ...
- Vue2.0源码阅读笔记--双向绑定实现原理
上一篇 文章 了解了Vue.js的生命周期.这篇分析Observe Data过程,了解Vue.js的双向数据绑定实现原理. 一.实现双向绑定的做法 前端MVVM最令人激动的就是双向绑定机制了,实现双向 ...
- Yii2.0源码阅读-从路由到控制器
之前的文章弄清了一次请求的开始到结束.主要讲了Yii Applicaton实例的创建.初始化,UrlManager如何返回Yii中的路由信息,到runAction,最后将Response发送给客户端. ...
- Yii2.0源码阅读-视图(View)渲染过程
之前的文章我们根据源码的分析,弄清了Yii如何处理一次请求,以及根据解析的路由如何调用控制器中的action,那接下来好奇的可能就是,我在控制器action中执行了return $this->r ...
- Vue2.0源码阅读笔记(二):响应式原理
Vue是数据驱动的框架,在修改数据时,视图会进行更新.数据响应式系统使得状态管理变的简单直接,在开发过程中减少与DOM元素的接触.而深入学习其中的原理十分有必要,能够回避一些常见的问题,使开发变的 ...
随机推荐
- adb Logcat用法
转自: http://blog.csdn.net/tiantianshangcha/article/details/6288537 个人认为有一下几个常用命令: adb logcat -b radio ...
- Exception testing
怎样去验证代码是否抛出我们期望的异常呢?虽然在代码正常结束时候验证很重要,但是在异常的情况下确保代码如我们希望的运行也很重要.比如说: new ArrayList<Object>().ge ...
- solr + tomcat 搭建
1.准备jdk7和tomcat72.拷贝solr目录下example/webapps/solr.war,到tomcat下的webapps目录中.3.启动tomcat74.编辑tomcat7中的weba ...
- 数据库 —— mySQL的基本操作
学习资源: 0.学习教程 :MySQL 教程(runoob.com) (MySQL Tutorial)turtorialPoint 1.学习帮助手册与平台: MySQL学习平台 英文手册chm ...
- 有趣的js符号{}、[]、""、~、++等转换
(!(~+[])+{})[--[~+""][+[]]*[~+[]] + ~~!+[]]+({}+[])[[~!+[]]*~+[]] http://www.jointforce.co ...
- 安装VS2012 update3提示缺少Microsoft根证书颁发机构2010或2011的解决方法
警告提示如图: (copy的百度贴吧的童鞋的截图) 解决方法: 下载2010.10或2011.10的根证书即可 直通车:http://maxsky.ys168.com/ ——05.||浮云文件||—— ...
- motan源码分析五:cluster相关
上一章我们分析了客户端调用服务端相关的源码,但是到了cluster里面的部分我们就没有分析了,本章将深入分析cluster和它的相关支持类. 1.clustersupport的创建过程,上一章的Ref ...
- Unity3D基础学习 加载场景时隐藏物体,点击显示时显示物体
隐藏物体有两种方法,一是设置Meshrender为False,即不渲染物体. 二是设置物体为False,禁用物体,我使用的第二种. 当场景中需要隐藏的物体很多时,我们可以添加一个层来表示需要隐藏的物体 ...
- ORACLE函数之单行数字函数
1. ABS(X) 返回X的绝对值 SQL>SELECT ABS(-1) A,ABS(1) B,ABS(0) C FROM DUAL; A B ...
- php开发中将远程图片本地化的方法
检查文本内容中的远程图片,下载远程图片到本地的方法示例. /** * 下载远程图片到本地 * * @param string $txt 用户输入的文字,可能包含有图片的url * @param str ...