创建Spark执行环境SparkEnv
SparkDriver 用于提交用户的应用程序,
一、SparkConf
负责SparkContext的配置参数加载, 主要通过ConcurrentHashMap来维护各种`spark.*`的配置属性
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable {
import SparkConf._
/** Create a SparkConf that loads defaults from system properties and the classpath */
def this() = this(true)
/**
* 维护一个ConcurrentHashMap 来存储spark配置
*/
private val settings = new ConcurrentHashMap[String, String]()
@transient private lazy val reader: ConfigReader = {
val _reader = new ConfigReader(new SparkConfigProvider(settings))
_reader.bindEnv(new ConfigProvider {
override def get(key: String): Option[String] = Option(getenv(key))
})
_reader
}
if (loadDefaults) {
loadFromSystemProperties(false)
}
/**
* 加载spark.*的配置
* @param silent
* @return
*/
private[spark] def loadFromSystemProperties(silent: Boolean): SparkConf = {
// Load any spark.* system properties, 只加载spark.*的配置
for ((key, value) <- Utils.getSystemProperties if key.startsWith("spark.")) {
set(key, value, silent)
}
this
}
}
二、SparkContext
2.1、创建Spark执行环境SparkEnv
SparkEnv是Spark的执行环境对象, 其中包括众多与Executor执行相关的对象。
创建, 主要通过SparkEnv.createSparkEnv, SparkContext初始化,只创建SparkEnv
def isLocal: Boolean = Utils.isLocalMaster(_conf) // An asynchronous listener bus for Spark events
//采用监听器模式维护各类事件的处理
private[spark] val listenerBus = new LiveListenerBus(this) // This function allows components created by SparkEnv to be mocked in unit tests:
private[spark] def createSparkEnv(
conf: SparkConf,
isLocal: Boolean,
listenerBus: LiveListenerBus): SparkEnv = {
//创建DriverEnv
SparkEnv.createDriverEnv(conf, isLocal, listenerBus, SparkContext.numDriverCores(master))
}
继续进入createDriverEnv, 发现调用的是create方法, 该方法是为Driver或Executor创建SparkEnv
点击createExecutorEnv发现是CoarseGrainedExecutorBackend调用
下面具体看看create()中做了什么操作
2.1.1、创建SecurityManager
//创建SecurityManager
val securityManager = new SecurityManager(conf, ioEncryptionKey)
ioEncryptionKey.foreach { _ =>
if (!securityManager.isSaslEncryptionEnabled()) {
logWarning("I/O encryption enabled without RPC encryption: keys will be visible on the " +
"wire.")
}
}
2.1.2、创建RpcEnv
val systemName = if (isDriver) driverSystemName else executorSystemName
val rpcEnv = RpcEnv.create(systemName, bindAddress, advertiseAddress, port, conf,
securityManager, clientMode = !isDriver)
2.1.3、通过反射创建序列化器, 此处默认创建JavaSerializer
// Create an instance of the class with the given name, possibly initializing it with our conf
def instantiateClass[T](className: String): T = {
val cls = Utils.classForName(className)
// Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just
// SparkConf, then one taking no arguments
try {
cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE)
.newInstance(conf, new java.lang.Boolean(isDriver))
.asInstanceOf[T]
} catch {
case _: NoSuchMethodException =>
try {
cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T]
} catch {
case _: NoSuchMethodException =>
cls.getConstructor().newInstance().asInstanceOf[T]
}
}
} // Create an instance of the class named by the given SparkConf property, or defaultClassName
// if the property is not set, possibly initializing it with our conf
def instantiateClassFromConf[T](propertyName: String, defaultClassName: String): T = {
instantiateClass[T](conf.get(propertyName, defaultClassName))
} val serializer = instantiateClassFromConf[Serializer](
"spark.serializer", "org.apache.spark.serializer.JavaSerializer")
logDebug(s"Using serializer: ${serializer.getClass}")
2.1.3、创建SerializeManager
val serializerManager = new SerializerManager(serializer, conf, ioEncryptionKey)
val closureSerializer = new JavaSerializer(conf)
2.1.4、创建BroadcastManager
val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)
2.1.5、创建MapOutputTracker
def registerOrLookupEndpoint(
name: String, endpointCreator: => RpcEndpoint):
RpcEndpointRef = {
if (isDriver) {
logInfo("Registering " + name)
rpcEnv.setupEndpoint(name, endpointCreator)
} else {
RpcUtils.makeDriverRef(name, conf, rpcEnv)
}
} val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) //创建MapOutputTracker 区分Driver, Executor
val mapOutputTracker = if (isDriver) {
//Driver需要BroadcastManager
new MapOutputTrackerMaster(conf, broadcastManager, isLocal)
} else {
new MapOutputTrackerWorker(conf)
} // Have to assign trackerEndpoint after initialization as MapOutputTrackerEndpoint
// requires the MapOutputTracker itself
mapOutputTracker.trackerEndpoint = registerOrLookupEndpoint(MapOutputTracker.ENDPOINT_NAME,
new MapOutputTrackerMasterEndpoint(
rpcEnv, mapOutputTracker.asInstanceOf[MapOutputTrackerMaster], conf))
2.1.6、创建ShuffleManager
// Let the user specify short names for shuffle managers
val shortShuffleMgrNames = Map(
"sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName,
"tungsten-sort" -> classOf[org.apache.spark.shuffle.sort.SortShuffleManager].getName)
val shuffleMgrName = conf.get("spark.shuffle.manager", "sort")
val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase, shuffleMgrName)
val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)
2.1.7、创建 BlockManager
val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
val memoryManager: MemoryManager =
if (useLegacyMemoryManager) {
new StaticMemoryManager(conf, numUsableCores)
} else {
UnifiedMemoryManager(conf, numUsableCores)
}
val blockManagerPort = if (isDriver) {
conf.get(DRIVER_BLOCK_MANAGER_PORT)
} else {
conf.get(BLOCK_MANAGER_PORT)
}
val blockTransferService =
new NettyBlockTransferService(conf, securityManager, bindAddress, advertiseAddress,
blockManagerPort, numUsableCores)
val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint(
BlockManagerMaster.DRIVER_ENDPOINT_NAME,
new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)),
conf, isDriver)
// NB: blockManager is not valid until initialize() is called later.
val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster,
serializerManager, conf, memoryManager, mapOutputTracker, shuffleManager,
blockTransferService, securityManager, numUsableCores)
2.1.8、创建MetricsSystem
val metricsSystem = if (isDriver) {
// Don't start metrics system right now for Driver.
// We need to wait for the task scheduler to give us an app ID.
// Then we can start the metrics system.
MetricsSystem.createMetricsSystem("driver", conf, securityManager)
} else {
// We need to set the executor ID before the MetricsSystem is created because sources and
// sinks specified in the metrics configuration file will want to incorporate this executor's
// ID into the metrics they report.
conf.set("spark.executor.id", executorId)
val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager)
ms.start()
ms
}
2.1.9、创建SparkEnv实例
val envInstance = new SparkEnv(
executorId,
rpcEnv,
serializer,
closureSerializer,
serializerManager,
mapOutputTracker,
shuffleManager,
broadcastManager,
blockManager,
securityManager,
metricsSystem,
memoryManager,
outputCommitCoordinator,
conf)
2.1.10、创建临时文件
// Add a reference to tmp dir created by driver, we will delete this tmp dir when stop() is
// called, and we only need to do it for driver. Because driver may run as a service, and if we
// don't delete this tmp dir when sc is stopped, then will create too many tmp dirs.
if (isDriver) {
val sparkFilesDir = Utils.createTempDir(Utils.getLocalDir(conf), "userFiles").getAbsolutePath
envInstance.driverTmpDir = Some(sparkFilesDir)
}
创建Spark执行环境SparkEnv的更多相关文章
- Spark源码剖析 - SparkContext的初始化(二)_创建执行环境SparkEnv
2. 创建执行环境SparkEnv SparkEnv是Spark的执行环境对象,其中包括众多与Executor执行相关的对象.由于在local模式下Driver会创建Executor,local-cl ...
- Spark 核心篇-SparkEnv
本章内容: 1.功能概述 SparkEnv是Spark的执行环境对象,其中包括与众多Executor执行相关的对象.Spark 对任务的计算都依托于 Executor 的能力,所有的 Executor ...
- javaScript执行环境、作用域链与闭包
一.执行环境 执行环境定义了变量和函数有权访问的其他数据,决定了他们各自的行为:每个执行环境都有一个与之关联的变量对象,环境中定义的所有变量和函数都保存在这个对象中.虽然我们编写的代码无法访问这个对象 ...
- VO、AO、执行环境和作用域链
1.变量对象(variable object) 原文:Every execution context has associated with it a variable object. Variabl ...
- 理解JS的执行环境
执行环境(Execution context,EC)或执行上下文,是JS中一个极为重要的概念 EC的组成 当JavaScript代码执行的时候,会进入不同的执行上下文,这些执行上下文会构成了一个执行上 ...
- Javascript高级编程学习笔记(9)—— 执行环境
今天主要讲一下,JS底层的一些东西,这些东西不太好举例(应该是我水平不够) 望大家多多海涵,比心心 执行环境 执行环境(执行上下文,全文使用执行环境 )是JS中最为重要的一个概念,执行环境决定了,变量 ...
- (O)JS:执行环境、变量对象、活动对象和作用域链(原创)
var a=1; function b(x){ var c=2; console.log(x); } b(3); ·执行环境(execution context),也称为环境.执行上下文.上下文环境. ...
- Javascript 函数及其执行环境和作用域
函数在javascript中可以说是一等公民,也是最有意思的事情,javascript函数其实也是一个对象,是Function类型的实例.因此声明一个函数首先可以使用 Function构造函数: va ...
- js的闭包中关于执行环境和作用链的理解
首先讲一讲执行环境: 执行环境按照字面上来理解就是指目前代码执行所在的环境. 当JavaScript代码执行的时候,会进入不同的执行上下文,这些执行上下文会构成了一个执行上下文栈(Execution ...
随机推荐
- java SWing事件调用的两种机制
Java(91) /** * java swing中事件调用的两种机制: * (一)响应机制 * (二)回调机制 */ package test; import java.awt.*; impo ...
- bzoj4821
线段树 这题真是无聊 把式子拆开,然后可知维护xi,yi,xi^2,xi*yi,重点在于标记下传,当我们进行2号操作时,直接累加进答案和标记即可,进行3号操作时,update时先把自己这层赋值成要改变 ...
- bzoj1833
http://www.lydsy.com/JudgeOnline/problem.php?id=1833 2.5个小时就花在这上面了... 水到200题了...然并卵,天天做水题有什么前途... #i ...
- [Apple开发者帐户帮助]八、管理档案(3)创建App Store配置文件
您可以创建自己的App Store配置文件,以便在将应用程序上载到App Store Connect时使用. 有关完整的App Store工作流程,请转到通过 Xcode帮助中的App Store分发 ...
- 马拉车算法(Manacher's Algorithm)
这是悦乐书的第343次更新,第367篇原创 Manacher's Algorithm,中文名叫马拉车算法,是一位名叫Manacher的人在1975年提出的一种算法,解决的问题是求最长回文子串,神奇之处 ...
- flask中路由系统
flask中的路由我们并不陌生,从一开始到现在都一直在应用 @app.route("/",methods=["GET","POST"]) 1 ...
- ACM_堆箱子咯(栈)
堆箱子咯 Time Limit: 2000/1000ms (Java/Others) Problem Description: 双十一大家都在买买买,可忙坏了快递小哥了.zl和皮卡鸡在大伙在剁手的时候 ...
- 为什么使用HttpServlet?http协议特点、servlet
因为只有HttpServlet是基于http协议,实现Servlet接口,而http协议是短连接协议,能够实现客户端访问服务端后,数据交互后 连接自动断开.同时http协议基于tcp.ip协议,封装了 ...
- Zookeeper概念学习系列之zookeeper的节点
znode有两种类型: 临时节点(ephemeral node) 和 持久节点(persistent node). znode的类型在创建时确定并且之后不能再修改. 短暂znode的客户端会话结束 ...
- [转]自适应网页设计(Responsive Web Design)
本文转自:http://www.ruanyifeng.com/blog/2012/05/responsive_web_design.html 作者: 阮一峰 日期: 2012年5月 1日 随着3G的普 ...