创建Spark执行环境SparkEnv
SparkDriver 用于提交用户的应用程序,
一、SparkConf
负责SparkContext的配置参数加载, 主要通过ConcurrentHashMap来维护各种`spark.*`的配置属性
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable {
import SparkConf._
/** Create a SparkConf that loads defaults from system properties and the classpath */
def this() = this(true)
/**
* 维护一个ConcurrentHashMap 来存储spark配置
*/
private val settings = new ConcurrentHashMap[String, String]()
@transient private lazy val reader: ConfigReader = {
val _reader = new ConfigReader(new SparkConfigProvider(settings))
_reader.bindEnv(new ConfigProvider {
override def get(key: String): Option[String] = Option(getenv(key))
})
_reader
}
if (loadDefaults) {
loadFromSystemProperties(false)
}
/**
* 加载spark.*的配置
* @param silent
* @return
*/
private[spark] def loadFromSystemProperties(silent: Boolean): SparkConf = {
// Load any spark.* system properties, 只加载spark.*的配置
for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
set(key, value, silent)
}
this
}
}
二、SparkContext
2.1、创建Spark执行环境SparkEnv
SparkEnv是Spark的执行环境对象, 其中包括众多与Executor执行相关的对象。
创建, 主要通过SparkEnv.createSparkEnv, SparkContext初始化,只创建SparkEnv
def isLocal: Boolean = Utils.isLocalMaster(_conf) // An asynchronous listener bus for Spark events
//采用监听器模式维护各类事件的处理
private[spark] val listenerBus = new LiveListenerBus(this) // This function allows components created by SparkEnv to be mocked in unit tests:
private[spark] def createSparkEnv(
conf: SparkConf,
isLocal: Boolean,
listenerBus: LiveListenerBus): SparkEnv = {
//创建DriverEnv
SparkEnv.createDriverEnv(conf, isLocal, listenerBus, SparkContext.numDriverCores(master))
}
继续进入createDriverEnv, 发现调用的是create方法, 该方法是为Driver或Executor创建SparkEnv
点击createExecutorEnv发现是CoarseGrainedExecutorBackend调用
下面具体看看create()中做了什么操作
2.1.1、创建SecurityManager
//创建SecurityManager
val securityManager = new SecurityManager(conf, ioEncryptionKey)
ioEncryptionKey.foreach { _ =>
if (!securityManager.isSaslEncryptionEnabled()) {
logWarning("I/O encryption enabled without RPC encryption: keys will be visible on the " +
"wire.")
}
}
2.1.2、创建RpcEnv
val systemName = if (isDriver) driverSystemName else executorSystemName
val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port, conf,
securityManager, clientMode = !isDriver)
2.1.3、通过反射创建序列化器, 此处默认创建JavaSerializer
// Create an instance of the class with the given name, possibly initializing it with our conf
def instantiateClass[T](className: String): T = {
val cls = Utils.classForName(className)
// Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just
// SparkConf, then one taking no arguments
try {
cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE)
.newInstance(conf, new java.lang.Boolean(isDriver))
.asInstanceOf[T]
} catch {
case _: NoSuchMethodException =>
try {
cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T]
} catch {
case _: NoSuchMethodException =>
cls.getConstructor().newInstance().asInstanceOf[T]
}
}
} // Create an instance of the class named by the given SparkConf property, or defaultClassName
// if the property is not set, possibly initializing it with our conf
def instantiateClassFromConf[T](propertyName: String, defaultClassName: String): T = {
instantiateClass[T](conf.get(propertyName, defaultClassName))
} val serializer = instantiateClassFromConf[Serializer](
"spark.serializer", "org.apache.spark.serializer.JavaSerializer")
logDebug(s"Using serializer: ${serializer.getClass}")
2.1.3、创建SerializeManager
val serializerManager = new SerializerManager(serializer, conf, ioEncryptionKey)
val closureSerializer = new JavaSerializer(conf)
2.1.4、创建BroadcastManager
val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)
2.1.5、创建MapOutputTracker
def registerOrLookupEndpoint(
name: String, endpointCreator: => RpcEndpoint):
RpcEndpointRef = {
if (isDriver) {
logInfo("Registering " + name)
rpcEnv.setupEndpoint(name, endpointCreator)
} else {
RpcUtils.makeDriverRef(name, conf, rpcEnv)
}
} val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) //创建MapOutputTracker 区分Driver, Executor
val mapOutputTracker = if (isDriver) {
//Driver需要BroadcastManager
new MapOutputTrackerMaster(conf, broadcastManager, isLocal)
} else {
new MapOutputTrackerWorker(conf)
} // Have to assign trackerEndpoint after initialization as MapOutputTrackerEndpoint
// requires the MapOutputTracker itself
mapOutputTracker.trackerEndpoint = registerOrLookupEndpoint(MapOutputTracker.ENDPOINT_NAME,
new MapOutputTrackerMasterEndpoint(
rpcEnv, mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], conf))
2.1.6、创建ShuffleManager
// Let the user specify short names for shuffle managers
val shortShuffleMgrNames = Map(
"sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName,
"tungsten-sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName)
val shuffleMgrName = conf.get("spark.shuffle.manager", "sort")
val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase, shuffleMgrName)
val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)
2.1.7、创建 BlockManager
val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
val memoryManager: MemoryManager =
if (useLegacyMemoryManager) {
new StaticMemoryManager(conf, numUsableCores)
} else {
UnifiedMemoryManager(conf, numUsableCores)
}
val blockManagerPort = if (isDriver) {
conf.get(DRIVER_BLOCK_MANAGER_PORT)
} else {
conf.get(BLOCK_MANAGER_PORT)
}
val blockTransferService =
new NettyBlockTransferService(conf, securityManager, bindAddress, advertiseAddress,
blockManagerPort, numUsableCores)
val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint(
BlockManagerMaster.DRIVER_ENDPOINT_NAME,
new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)),
conf, isDriver)
// NB: blockManager is not valid until initialize() is called later.
val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster,
serializerManager, conf, memoryManager, mapOutputTracker, shuffleManager,
blockTransferService, securityManager, numUsableCores)
2.1.8、创建MetricsSystem
val metricsSystem = if (isDriver) {
// Don't start metrics system right now for Driver.
// We need to wait for the task scheduler to give us an app ID.
// Then we can start the metrics system.
MetricsSystem.createMetricsSystem("driver", conf, securityManager)
} else {
// We need to set the executor ID before the MetricsSystem is created because sources and
// sinks specified in the metrics configuration file will want to incorporate this executor's
// ID into the metrics they report.
conf.set("spark.executor.id", executorId)
val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager)
ms.start()
ms
}
2.1.9、创建SparkEnv实例
val envInstance = new SparkEnv(
executorId,
rpcEnv,
serializer,
closureSerializer,
serializerManager,
mapOutputTracker,
shuffleManager,
broadcastManager,
blockManager,
securityManager,
metricsSystem,
memoryManager,
outputCommitCoordinator,
conf)
2.1.10、创建临时文件
// Add a reference to tmp dir created by driver, we will delete this tmp dir when stop() is
// called, and we only need to do it for driver. Because driver may run as a service, and if we
// don't delete this tmp dir when sc is stopped, then will create too many tmp dirs.
if (isDriver) {
val sparkFilesDir = Utils.createTempDir(Utils.getLocalDir(conf), "userFiles").getAbsolutePath
envInstance.driverTmpDir = Some(sparkFilesDir)
}
创建Spark执行环境SparkEnv的更多相关文章
- Spark源码剖析 - SparkContext的初始化(二)_创建执行环境SparkEnv
2. 创建执行环境SparkEnv SparkEnv是Spark的执行环境对象,其中包括众多与Executor执行相关的对象.由于在local模式下Driver会创建Executor,local-cl ...
- Spark 核心篇-SparkEnv
本章内容: 1.功能概述 SparkEnv是Spark的执行环境对象,其中包括与众多Executor执行相关的对象.Spark 对任务的计算都依托于 Executor 的能力,所有的 Executor ...
- javaScript执行环境、作用域链与闭包
一.执行环境 执行环境定义了变量和函数有权访问的其他数据,决定了他们各自的行为:每个执行环境都有一个与之关联的变量对象,环境中定义的所有变量和函数都保存在这个对象中.虽然我们编写的代码无法访问这个对象 ...
- VO、AO、执行环境和作用域链
1.变量对象(variable object) 原文:Every execution context has associated with it a variable object. Variabl ...
- 理解JS的执行环境
执行环境(Execution context,EC)或执行上下文,是JS中一个极为重要的概念 EC的组成 当JavaScript代码执行的时候,会进入不同的执行上下文,这些执行上下文会构成了一个执行上 ...
- Javascript高级编程学习笔记(9)—— 执行环境
今天主要讲一下,JS底层的一些东西,这些东西不太好举例(应该是我水平不够) 望大家多多海涵,比心心 执行环境 执行环境(执行上下文,全文使用执行环境 )是JS中最为重要的一个概念,执行环境决定了,变量 ...
- (O)JS:执行环境、变量对象、活动对象和作用域链(原创)
var a=1; function b(x){ var c=2; console.log(x); } b(3); ·执行环境(execution context),也称为环境.执行上下文.上下文环境. ...
- Javascript 函数及其执行环境和作用域
函数在javascript中可以说是一等公民,也是最有意思的事情,javascript函数其实也是一个对象,是Function类型的实例.因此声明一个函数首先可以使用 Function构造函数: va ...
- js的闭包中关于执行环境和作用链的理解
首先讲一讲执行环境: 执行环境按照字面上来理解就是指目前代码执行所在的环境. 当JavaScript代码执行的时候,会进入不同的执行上下文,这些执行上下文会构成了一个执行上下文栈(Execution ...
随机推荐
- P3225 [HNOI2012]矿场搭建 tarjan割点
这个题需要发现一点规律,就是先按割点求块,然后求每个联通块中有几个割点,假如没有割点,则需要建两个出口,如果一个割点,则需要建一个出口,2个以上不用建. 题干: 题目描述 煤矿工地可以看成是由隧道连接 ...
- 洛谷P2303 [SDOi2012]Longge的问题
题目背景 SDOi2012 题目描述 Longge的数学成绩非常好,并且他非常乐于挑战高难度的数学问题.现在问题来了:给定一个整数N,你需要求出∑gcd(i, N)(1<=i <=N). ...
- Notepad++ - 通过语言格式设置自定义语法高亮颜色
http://blog.csdn.net/onceing/article/details/51554399 Global Styles Indent guideline style 缩进参考线的颜色 ...
- [Swift通天遁地]七、数据与安全-(20)快速实现MD5/Poly1305/Aes/BlowFish/Chacha/Rabbit
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★➤微信公众号:山青咏芝(shanqingyongzhi)➤博客园地址:山青咏芝(https://www.cnblogs. ...
- jeecg中列表查询数据关联其他表的显示
1.A表字段:id,name;B表字段:id,name,fid(A表外键),现查询A表和B表的所有数据并且查询条件A,B都有,在前台页面list显示 2.后台方法: @RequestMapping(p ...
- vuejs {{}},v-text 和 v-html的区别
<div id="app"> <p>{{message}}</p> <!-- 输出:<span>通过双括号绑定</spa ...
- android 二维码 扫描,生成,竖屏
最近公司有用到二维码,生成,扫描,所以学习了一下,和大家分享: demo 见下面链接,已经改成竖屏: http://download.csdn.net/detail/q610098308/868101 ...
- [转]在 Linux 下使用 RAID
转自:http://www.linuxidc.com/Linux/2015-08/122191.htm RAID 的意思是廉价磁盘冗余阵列(Redundant Array of Inexpensive ...
- 10.Nodes and Bindings
节点数据绑定 节点是构成Ventuz场景的基本元素.每个节点既属于图层.也属于层级或内容.既可以在图层编辑器,也可以在层级编辑器或内容编辑器中编辑. 内容节点包括资产描述(如材质.xml文件等).数字 ...
- [ SCOI 2007 ] Perm
\(\\\) \(Description\) 给出只包括多个\(0\text~ 9\)的数字集,求有多少个本质不同的全排列,使得组成的数字能够整除\(M\). \(|S|\in [1,10]\),\( ...