spark源码学习-withScope

withScope是最近的发现版中新增加的一个模块，它是用来做DAG可视化的（DAG visualization on SparkUI）

以前的sparkUI中只有stage的执行情况，也就是说我们不可以看到上个RDD到下个RDD的具体信息。于是为了在

sparkUI中能展示更多的信息。所以把所有创建的RDD的方法都包裹起来，同时用RDDOperationScope 记录 RDD 的操作历史和关联，就能达成目标。下面就是一张WordCount的DAG visualization on SparkUI

记录关系的RDDOperationScope源码如下：

/**

 * A general, named code block representing an operation that instantiates RDDs.

 *

 * All RDDs instantiated in the corresponding code block will store a pointer to this object.

 * Examples include, but will not be limited to, existing RDD operations, such as textFile,

 * reduceByKey, and treeAggregate.

 *

 * An operation scope may be nested in other scopes. For instance, a SQL query may enclose

 * scopes associated with the public RDD APIs it uses under the hood.

 *

 * There is no particular relationship between an operation scope and a stage or a job.

 * A scope may live inside one stage (e.g. map) or span across multiple jobs (e.g. take).

 */

@JsonInclude(Include.NON_NULL)

@JsonPropertyOrder(Array("id", "name", "parent"))

private[spark] class RDDOperationScope(

    val name: String,

    val parent: Option[RDDOperationScope] = None,

    val id: String = RDDOperationScope.nextScopeId().toString) {

  def toJson: String = {

    RDDOperationScope.jsonMapper.writeValueAsString(this)

  }

  /**

   * Return a list of scopes that this scope is a part of, including this scope itself.

   * The result is ordered from the outermost scope (eldest ancestor) to this scope.

   */

  @JsonIgnore

  def getAllScopes: Seq[RDDOperationScope] = {

    parent.map(_.getAllScopes).getOrElse(Seq.empty) ++ Seq(this)

  }

  override def equals(other: Any): Boolean = {

    other match {

      case s: RDDOperationScope =>

        id == s.id && name == s.name && parent == s.parent

      case _ => false

    }

  }

  override def hashCode(): Int = Objects.hashCode(id, name, parent)

  override def toString: String = toJson

}

/**

 * A collection of utility methods to construct a hierarchical representation of RDD scopes.

 * An RDD scope tracks the series of operations that created a given RDD.

 */

private[spark] object RDDOperationScope extends Logging {

  private val jsonMapper = new ObjectMapper().registerModule(DefaultScalaModule)

  private val scopeCounter = new AtomicInteger()

  def fromJson(s: String): RDDOperationScope = {

    jsonMapper.readValue(s, classOf[RDDOperationScope])

  }

  /** Return a globally unique operation scope ID. */

  def nextScopeId(): Int = scopeCounter.getAndIncrement

  /**

   * Execute the given body such that all RDDs created in this body will have the same scope.

   * The name of the scope will be the first method name in the stack trace that is not the

   * same as this method's.

   *

   * Note: Return statements are NOT allowed in body.

   */

  private[spark] def withScope[T](

      sc: SparkContext,

      allowNesting: Boolean = false)(body: => T): T = {

    val ourMethodName = "withScope"

    val callerMethodName = Thread.currentThread.getStackTrace()

      .dropWhile(_.getMethodName != ourMethodName)

      .find(_.getMethodName != ourMethodName)

      .map(_.getMethodName)

      .getOrElse {

        // Log a warning just in case, but this should almost certainly never happen

        logWarning("No valid method name for this RDD operation scope!")

        "N/A"

      }

    withScope[T](sc, callerMethodName, allowNesting, ignoreParent = false)(body)

  }

  /**

   * Execute the given body such that all RDDs created in this body will have the same scope.

   *

   * If nesting is allowed, any subsequent calls to this method in the given body will instantiate

   * child scopes that are nested within our scope. Otherwise, these calls will take no effect.

   *

   * Additionally, the caller of this method may optionally ignore the configurations and scopes

   * set by the higher level caller. In this case, this method will ignore the parent caller's

   * intention to disallow nesting, and the new scope instantiated will not have a parent. This

   * is useful for scoping physical operations in Spark SQL, for instance.

   *

   * Note: Return statements are NOT allowed in body.

   */

  private[spark] def withScope[T](

      sc: SparkContext,

      name: String,

      allowNesting: Boolean,

      ignoreParent: Boolean)(body: => T): T = {

    // Save the old scope to restore it later

    val scopeKey = SparkContext.RDD_SCOPE_KEY

    val noOverrideKey = SparkContext.RDD_SCOPE_NO_OVERRIDE_KEY

    val oldScopeJson = sc.getLocalProperty(scopeKey)

    val oldScope = Option(oldScopeJson).map(RDDOperationScope.fromJson)

    val oldNoOverride = sc.getLocalProperty(noOverrideKey)

    try {

      if (ignoreParent) {

        // Ignore all parent settings and scopes and start afresh with our own root scope

        sc.setLocalProperty(scopeKey, new RDDOperationScope(name).toJson)

      } else if (sc.getLocalProperty(noOverrideKey) == null) {

        // Otherwise, set the scope only if the higher level caller allows us to do so

        sc.setLocalProperty(scopeKey, new RDDOperationScope(name, oldScope).toJson)

      }

      // Optionally disallow the child body to override our scope

      if (!allowNesting) {

        sc.setLocalProperty(noOverrideKey, "true")

        log.info("this is textFile1")

        log.info("this is textFile2" )

        //println("this is textFile3")

        log.error("this is textFile4err")

        log.warn("this is textFile5WARN")

        log.debug("this is textFile6debug")

      }

      body

    } finally {

      // Remember to restore any state that was modified before exiting

      sc.setLocalProperty(scopeKey, oldScopeJson)

      sc.setLocalProperty(noOverrideKey, oldNoOverride)

    }

  }

}

spark源码学习-withScope的更多相关文章

Spark源码学习1.2——TaskSchedulerImpl.scala
许久没有写博客了,没有太多时间,最近陆续将Spark源码的一些阅读笔记传上,接下来要修改Spark源码了. 这个类继承于TaskScheduler类,重载了TaskScheduler中的大部分方法,是 ...
Spark源码学习1.1——DAGScheduler.scala
本文以Spark1.1.0版本为基础. 经过前一段时间的学习,基本上能够对Spark的工作流程有一个了解,但是具体的细节还是需要阅读源码,而且后续的科研过程中也肯定要修改源码的,所以最近开始Spark ...
Spark源码学习2
转自:http://www.cnblogs.com/hseagle/p/3673123.html 在源码阅读时,需要重点把握以下两大主线. 静态view 即 RDD, transformation a ...
Spark源码学习1.6——Executor.scala
Executor.scala 一.Executor类首先判断本地性,获取slaves的host name(不是IP或者host: port),匹配运行环境为集群或者本地.如果不是本地执行,需要启动一 ...
Spark源码学习1.5——BlockManager.scala
一.BlockResult类该类用来表示返回的匹配的block及其相关的参数.共有三个参数: data:Iterator [Any]. readMethod: DataReadMethod.Valu ...
Spark源码学习1.4——MapOutputTracker.scala
相关类:MapOutputTrackerMessage,GetMapOutputStatuses extends MapPutputTrackerMessage,StopMapOutputTracke ...
Spark源码学习3
转自:http://www.cnblogs.com/hseagle/p/3673132.html 一.概要本篇主要阐述在TaskRunner中执行的task其业务逻辑是如何被调用到的,另外试图讲清楚 ...
Spark源码学习1
转自:http://www.cnblogs.com/hseagle/p/3664933.html 一.基本概念(Basic Concepts) RDD - resillient distributed ...
Spark源码学习1.8——ShuffleBlockManager.scala
shuffleBlockManager继承于Logging,参数为blockManager和shuffleManager.shuffle文件有三个特性:shuffleId,整个shuffle stag ...

随机推荐

用UltraEdit比較两个文件
在编写代码的过程中,经常碰到两个文件之间的逐行比較.特别是新代码与源码之间的文字比較,这里介绍用UltraEdit实现新代码与源码之间的比較方法. //源码:Bearing.mac FINISH /C ...
两个喜欢的"新"C#语法
现在C#比较新的语法,我都十分喜欢. 比如属性可设默认值: public string Name { get; set; } = "张三"; 还有一个就是拼接字符串. 以往,通常都 ...
Why was 80 Chosen as the Default HTTP Port and 443 as the Default HTTPS Port?
https://www.howtogeek.com/233383/why-was-80-chosen-as-the-default-http-port-and-443-as-the-default-h ...
Cron Expression
CronTrigger CronTriggers往往比SimpleTrigger更有用,如果您需要基于日历的概念,而非SimpleTrigger完全指定的时间间隔,复发的发射工作的时间表.CronTr ...
Mac Mysql [ERR] 2006 - MySQL server has gone away
Mac mysql 安装后,导入sql数据,出现这个错误: 处理方式,是因为sql文件太大,需要修改mysql的配置.如果没有my.cnf就自己建一个. cd /etc sudo vim my.cnf ...
Ubuntu 12.10安装vmware-tools
1:[菜单]->[虚拟机]->[重新安装vmware tools]出现图中下边说的很清楚,解压然后执行 2:把压缩包拷贝到 /home/下,然后执行 :tar -zxvf v[按住tab ...
mipi屏在内核可以显示logo但是u-boot无法显示的问题【转】
本文转载自:http://blog.csdn.net/fulinus/article/details/45071721 平台:瑞芯的rk3288 u-boot版本:u-boot-2014.10 ker ...
关于mysqld_safe
昨天花了一天时间写了mysql的源码安装,比较蛋疼.其中对于mysqld_safe尤其不理解,因为使用apt-get安装几乎中间不需要什么配置,只需要service mysql start即可,但是源 ...
UVA - 1401 Remember the Word（trie+dp）
1.给一个串,在给一个单词集合,求用这个单词集合组成串,共有多少种组法. 例如:串 abcd, 单词集合 a, b, cd, ab 组合方式:2种: a,b,cd ab,cd 2.把单词集合建立字典树 ...
NSArray, NSSet, NSDictionary
一.Foundation framework中用于收集cocoa对象(NSObject对象)的三种集合分别是: NSArray 用于对象有序集合(数组)NSSet 用于对象无序集合(集合) NSDic ...

spark源码学习-withScope

spark源码学习-withScope的更多相关文章

随机推荐

热门专题