spark1.1.0源码阅读-executor

1. executor上执行launchTask

   def launchTask(

       context: ExecutorBackend, taskId: Long, taskName: String, serializedTask: ByteBuffer) {

     val tr = new TaskRunner(context, taskId, taskName, serializedTask)

     runningTasks.put(taskId, tr)

     threadPool.execute(tr)

   }

2. executor上执行TaskRunner的run

  class TaskRunner(

       execBackend: ExecutorBackend, val taskId: Long, taskName: String, serializedTask: ByteBuffer)

     extends Runnable {

     @volatile private var killed = false

     @volatile var task: Task[Any] = _

     @volatile var attemptedTask: Option[Task[Any]] = None

     def kill(interruptThread: Boolean) {

       logInfo(s"Executor is trying to kill $taskName (TID $taskId)")

       killed = true

       if (task != null) {

         task.kill(interruptThread)

       }

     }

     override def run() {

       val startTime = System.currentTimeMillis()

       SparkEnv.set(env)

       Thread.currentThread.setContextClassLoader(replClassLoader)

       val ser = SparkEnv.get.closureSerializer.newInstance()

       logInfo(s"Running $taskName (TID $taskId)")

       execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)

       var taskStart: Long = 0

       def gcTime = ManagementFactory.getGarbageCollectorMXBeans.map(_.getCollectionTime).sum

       val startGCTime = gcTime

       try {

         SparkEnv.set(env)

         Accumulators.clear()

         val (taskFiles, taskJars, taskBytes) = Task.deserializeWithDependencies(serializedTask)  //反序列化出 taskFiles，taskJars，taskBytes

         updateDependencies(taskFiles, taskJars)

         task = ser.deserialize[Task[Any]](taskBytes, Thread.currentThread.getContextClassLoader)  //反序列化出task对象

         // If this task has been killed before we deserialized it, let's quit now. Otherwise,

         // continue executing the task.

         if (killed) {

           // Throw an exception rather than returning, because returning within a try{} block

           // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl

           // exception will be caught by the catch block, leading to an incorrect ExceptionFailure

           // for the task.

           throw new TaskKilledException

         }

         attemptedTask = Some(task)

         logDebug("Task " + taskId + "'s epoch is " + task.epoch)

         env.mapOutputTracker.updateEpoch(task.epoch)

49         // Run the actual task and measure its runtime.

50         taskStart = System.currentTimeMillis()

51         val value = task.run(taskId.toInt)

52         val taskFinish = System.currentTimeMillis()

         // If the task has been killed, let's fail it.

         if (task.killed) {

           throw new TaskKilledException

         }

3. task.run

 private[spark] abstract class Task[T](val stageId: Int, var partitionId: Int) extends Serializable {

   final def run(attemptId: Long): T = {

     context = new TaskContext(stageId, partitionId, attemptId, runningLocally = false)

     context.taskMetrics.hostname = Utils.localHostName()

     taskThread = Thread.currentThread()

     if (_killed) {

       kill(interruptThread = false)

     }

10     runTask(context)

   }

4. task是抽象类，对于具体的类（resultTask和shuffleMapTask）会执行相应的runTask。

a. resultTask

   override def runTask(context: TaskContext): U = {

 2     // Deserialize the RDD and the func using the broadcast variables.

     val ser = SparkEnv.get.closureSerializer.newInstance()

     val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](

       ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)

     metrics = Some(context.taskMetrics)

     try {

       func(context, rdd.iterator(partition, context))

     } finally {

       context.markTaskCompleted()

     }

   }

b. shuffleMapTask

   override def runTask(context: TaskContext): MapStatus = {

     // Deserialize the RDD using the broadcast variable.

     val ser = SparkEnv.get.closureSerializer.newInstance()

     val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](

       ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)

     metrics = Some(context.taskMetrics)

     var writer: ShuffleWriter[Any, Any] = null

     try {

       val manager = SparkEnv.get.shuffleManager

       writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)

12       writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])

       return writer.stop(success = true).get

     } catch {

       case e: Exception =>

         if (writer != null) {

           writer.stop(success = false)

         }

         throw e

     } finally {

       context.markTaskCompleted()

     }

   }

   /** Write a bunch of records to this task's output */

   override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {

     val iter = if (dep.aggregator.isDefined) {

       if (dep.mapSideCombine) {

         dep.aggregator.get.combineValuesByKey(records, context)

       } else {

         records

       }

     } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {

       throw new IllegalStateException("Aggregator is empty for map-side combine")

     } else {

       records

     }

     for (elem <- iter) {

       val bucketId = dep.partitioner.getPartition(elem._1)

       shuffle.writers(bucketId).write(elem)

     }

   }

 1   /**

 2    * Get a ShuffleWriterGroup for the given map task, which will register it as complete

 3    * when the writers are closed successfully

 4    */

   def forMapTask(shuffleId: Int, mapId: Int, numBuckets: Int, serializer: Serializer,

       writeMetrics: ShuffleWriteMetrics) = {

     new ShuffleWriterGroup {

       shuffleStates.putIfAbsent(shuffleId, new ShuffleState(numBuckets))

       private val shuffleState = shuffleStates(shuffleId)

       private var fileGroup: ShuffleFileGroup = null

       val writers: Array[BlockObjectWriter] = if (consolidateShuffleFiles) {

         fileGroup = getUnusedFileGroup()

         Array.tabulate[BlockObjectWriter](numBuckets) { bucketId =>

           val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)

           blockManager.getDiskWriter(blockId, fileGroup(bucketId), serializer, bufferSize,

             writeMetrics)

         }

       } else {

         Array.tabulate[BlockObjectWriter](numBuckets) { bucketId =>

           val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)

           val blockFile = blockManager.diskBlockManager.getFile(blockId)

           // Because of previous failures, the shuffle file may already exist on this machine.

           // If so, remove it.

           if (blockFile.exists) {

             if (blockFile.delete()) {

               logInfo(s"Removed existing shuffle file $blockFile")

             } else {

               logWarning(s"Failed to remove existing shuffle file $blockFile")

             }

           }

           blockManager.getDiskWriter(blockId, blockFile, serializer, bufferSize, writeMetrics)

         }

       }

spark1.1.0源码阅读-executor的更多相关文章

spark1.1.0源码阅读-taskScheduler
1. sparkContext中设置createTaskScheduler case "yarn-standalone" | "yarn-cluster" =& ...
spark1.1.0源码阅读-dagscheduler and stage
1. rdd action ->sparkContext.runJob->dagscheduler.runJob def runJob[T, U: ClassTag]( rdd: RDD[ ...
Yii2.0源码阅读-一次请求的完整过程
Yii2.0框架源码阅读,从请求发起,到结束的运行步骤其实最初阅读是从yii\web\UrlManager这个类开始看起,不断的寻找这个类中方法的调用者,最终回到了yii\web\Applicati ...
Vue2.0源码阅读笔记（四）：nextTick
在阅读 nextTick 的源码之前,要先弄明白 JS 执行环境运行机制,介绍 JS 执行环境的事件循环机制的文章很多,大部分都阐述的比较笼统,甚至有些文章说的是错误的,以下为个人理解,如有错误, ...
Vue2.0源码阅读笔记--生命周期
一.Vue2.0的生命周期 Vue2.0的整个生命周期有八个:分别是 1.beforeCreate,2.created,3.beforeMount,4.mounted,5.beforeUpdate,6 ...
Vue2.0源码阅读笔记--双向绑定实现原理
上一篇文章了解了Vue.js的生命周期.这篇分析Observe Data过程,了解Vue.js的双向数据绑定实现原理. 一.实现双向绑定的做法前端MVVM最令人激动的就是双向绑定机制了,实现双向 ...
Yii2.0源码阅读-从路由到控制器
之前的文章弄清了一次请求的开始到结束.主要讲了Yii Applicaton实例的创建.初始化,UrlManager如何返回Yii中的路由信息,到runAction,最后将Response发送给客户端. ...
Yii2.0源码阅读-视图(View)渲染过程
之前的文章我们根据源码的分析,弄清了Yii如何处理一次请求,以及根据解析的路由如何调用控制器中的action,那接下来好奇的可能就是,我在控制器action中执行了return $this->r ...
Vue2.0源码阅读笔记（二）：响应式原理
Vue是数据驱动的框架,在修改数据时,视图会进行更新.数据响应式系统使得状态管理变的简单直接,在开发过程中减少与DOM元素的接触.而深入学习其中的原理十分有必要,能够回避一些常见的问题,使开发变的 ...

随机推荐

Mysql 的字符编码机制、中文乱码问题及解决方案【转载】
本文转载自:http://hi.baidu.com/huabinyin/item/7f51e462df565c97c4d24929.感谢作者及相关博主. 相信很多朋友都会对字符编码敬而远 ...
Java IO - BufferedReader & BufferedWriter
java.io 包提供丰富的 IO 读写功能,封装在不同的类里面.其中,使用 BufferedReader 和 BufferedWriter 可以方便地进行读.写文件的操作. 使用例子如下: impo ...
POJ2533 Longest ordered subsequence
Longest Ordered Subsequence Time Limit: 2000MS Memory Limit: 65536K Total Submissions: 41984 Acc ...
jQuery中bind,live,delegate与one方法的用法及区别解析
bind( )方法用于将一个处理程序附加到每个匹配元素的事件上并返回jQuery对象. .bind(eventType[, evnetData], Handler(eventObject)) 其中,参 ...
Android Studio中自己定义快捷输入块
快捷键:Ctrl + Alt + s,进入Settings >Editor>Live Templates>output中加入一个项,选择第一个Live Template waterm ...
【代码优化】equals深入理解
覆盖equals时,遵守通用约定对equal方法的覆盖看起来非常easy,可是有很多情况是容易导致错误,最好的避免这些错误的办法就是不覆盖equals方法. 必须遵循的原则: 自反性--对于不论什 ...
Android消息机制(2)
在Android 中,线程内部或者线程之间进行信息交互时经常会使用消息,这些基础的东西如果我们熟悉其内部的原理,将会使我们容易.更好地架构系统,避免一些低级的错误. 下面我们分析下程序的运行过程: 1 ...
jQuery UI Widget(1.8.1)工作原理--转载
先看下代码的相关注释: /*! * jQuery UI Widget 1.8.1 * * Copyright (c) 2010 AUTHORS.txt (http://jqueryui.com/abo ...
RSA加密解密及数字签名Java实现--转
RSA公钥加密算法是1977年由罗纳德·李维斯特(Ron Rivest).阿迪·萨莫尔(Adi Shamir)和伦纳德·阿德曼(Leonard Adleman)一起提出的.当时他们三人都在麻省理工学院 ...
Linux设备驱动——内核定时器
内核定时器使用内核定时器是内核用来控制在未来某个时间点(基于jiffies)调度执行某个函数的一种机制,其实现位于 <Linux/timer.h> 和 kernel/timer.c 文件 ...

spark1.1.0源码阅读-executor

spark1.1.0源码阅读-executor的更多相关文章

随机推荐

热门专题