一、集群启动过程--启动Master

$SPARK_HOME/sbin/start-master.sh

start-master.sh脚本关键内容:

spark-daemon.sh start org.apache.spark.deploy.master.Master 1 --ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT

日志信息:$SPARK_HOME/logs/

// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@hadoop000:7077]
// :: INFO master.Master: Starting Spark master at spark://hadoop000:7077
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO ui.MasterWebUI: Started MasterWebUI at http://hadoop000:8080
// :: INFO master.Master: I have been elected leader! New state: ALIVE

二、集群启动过程--启动Worker

$SPARK_HOME/sbin/start-slaves.sh

start-slaves.sh脚本关键内容:

spark-daemon.sh start org.apache.spark.deploy.worker.Worker master-spark-URL

Worker运行时,需要注册到指定的master url,这里就是spark://hadoop000:7077

Worker启动之后主要做了两件事情:
  1)将自己注册到Master(RegisterWorker);
  2)定期发送心跳信息给Master;

Worker向Master发送注册信息:

Worker.scala
    ==>preStart
      ==>registerWithMaster
        ==>tryRegisterAllMasters
          ==> actor ! RegisterWorker(workerId, host, port, cores, memory, webUi.boundPort, publicAddress)

Master侧收到RegisterWorker通知:

Master.scala
  ==>case RegisterWorker(id, workerHost, workerPort, cores, memory, workerUiPort, publicAddress) => {
      val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
    sender, workerUiPort, publicAddress)
    if (registerWorker(worker)) {
      persistenceEngine.addWorker(worker)
        sender ! RegisteredWorker(masterUrl, masterWebUiUrl) //注册成功后向Worker发送注册成功信息
        schedule()
      }
    }

Worker在收到Master发来的注册成功信息后,定期向Master发送心跳信息

Worker.scala
  ==>case SendHeartbeat =>
    masterLock.synchronized {if (connected) { master ! Heartbeat(workerId) }
  }

Master在接收到Worker发送来的心跳信息后更新最后一次心跳时间

Master.scala
  ==>case Heartbeat(workerId) => {
      idToWorker.get(workerId) match {
  case Some(workerInfo) =>
          workerInfo.lastHeartbeat =
System.currentTimeMillis()
      }
  }

Master定期移除超时未发送心跳信息给Master的Worker节点

Master.scala
  ==>preStart
    ==>CheckForWorkerTimeOut
      ==>case CheckForWorkerTimeOut => {timeOutDeadWorkers()} //Check for, and remove, any timed-out workers

日志信息:$SPARK_HOME/logs/

Master部分日志信息:

14/07/22 13:41:36 INFO master.Master: Registering worker hadoop000:48343 with 1 cores, 2.0 GB RAM

Worker部分日志信息:

14/07/22 13:41:35 INFO Worker: Starting Spark worker hadoop000:48343 with 1 cores, 2.0 GB RAM
14/07/22 13:41:35 INFO Worker: Spark home: /home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0
14/07/22 13:41:35 INFO WorkerWebUI: Started WorkerWebUI at http://hadoop000:8081
14/07/22 13:41:35 INFO Worker: Connecting to master spark://hadoop000:7077...
14/07/22 13:41:36 INFO Worker: Successfully registered with master spark://hadoop000:7077

三、Application提交过程

A、提交Application

运行spark-shell: $SPARK_HOME/bin/spark-shell --master spark://hadoop000:7077

日志信息:$SPARK_HOME/work

spark-shell属于application,在启动SparkContext的createTaskScheduler创建SparkDeploySchedulerBackend的过程中创建

client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
client.start()

会向Master发送RegisterApplication请求

AppClient.scala
  ==>preStart
    ==>registerWithMaster
      ==>tryRegisterAllMasters
        ==>actor ! RegisterApplication(appDescription)

B、 Master处理RegisterApplication的请求

在Master侧其处理的分支是RegisterApplication;Master在收到RegisterApplication请求之后,Master进行调度:如果有worker已经注册上来,发送LaunchExecutor指令给相应worker

Master.scala
==>case RegisterApplication(description) => {
logInfo("Registering app " + description.name)
val app = createApplication(description, sender)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
sender ! RegisteredApplication(app.id, masterUrl)
schedule()
}
==>schedule
==>launchExecutor(worker, exec)
==> worker.addExecutor(exec)
worker.actor ! LaunchExecutor(masterUrl,exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory)
exec.application.driver ! ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory)

C、启动Executor

Worker在收到LaunchExecutor指令之后,会启动Executor进程

Worker.scala
==>case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
val manager = new ExecutorRunner(appId, execId, appDesc, cores_, memory_,
self, workerId, host,
appDesc.sparkHome.map(userSparkHome => new File(userSparkHome)).getOrElse(sparkHome),
workDir, akkaUrl, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
masterLock.synchronized {master ! ExecutorStateChanged(appId, execId, manager.state, None, None)}
}

D、注册Executor

启动的Executor进程会根据启动时的入参,将自己注册到Driver中的SchedulerBackend

SparkDeploySchedulerBackend.scala
==>preStart (CoarseGrainedSchedulerBackend)
==> case RegisterExecutor(executorId, hostPort, cores) =>
logInfo("Registered executor: " + sender + " with ID " + executorId)
sender ! RegisteredExecutor(sparkProperties)
executorActor(executorId) = sender
executorHost(executorId) = Utils.parseHostPort(hostPort)._1
totalCores(executorId) = cores
freeCores(executorId) = cores
executorAddress(executorId) = sender.path.address
addressToExecutorId(sender.path.address) = executorId
totalCoreCount.addAndGet(cores)
makeOffers() CoarseGrainedExecutorBackend.scala
case RegisteredExecutor(sparkProperties) =>
ogInfo("Successfully registered with driver")
executor = new Executor(executorId, Utils.parseHostPort(hostPort)._1, sparkProperties,false)

executor日志信息位置:控制台/$SPARK_HOME/logs

E、运行Task

示例代码:

sc.textFile("hdfs://hadoop000:8020/hello.txt").flatMap(_.split('\t')).map((_,1)).reduceByKey(_+_).collect

SchedulerBackend收到Executor的注册消息之后,会将提交到的Spark Job分解为多个具体的Task,然后通过LaunchTask指令将这些Task分散到各个Executor上真正的运行

CoarseGrainedSchedulerBackend.scala
def makeOffers() {
launchTasks(scheduler.resourceOffers(
executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))}))
} ==>executorActor(task.executorId) ! LaunchTask(new SerializableBuffer(serializedTask))
==>CoarseGrainedSchedulerBackend case LaunchTask(data) =>
if (executor == null) {
logError("Received LaunchTask command but executor was null")
System.exit(1)
} else {
val ser = SparkEnv.get.closureSerializer.newInstance()
val taskDesc = ser.deserialize[TaskDescription](data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskDesc.taskId, taskDesc.serializedTask)
}

Master部分日志信息:

14/07/22 15:25:27 INFO master.Master: Registering app Spark shell
14/07/22 15:25:27 INFO master.Master: Registered app Spark shell with ID app-20140722152527-0001
14/07/22 15:25:27 INFO master.Master: Launching executor app-20140722152527-0001/0 on worker worker-20140722134135-hadoop000-48343

Worker部分日志信息:

Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/07/22 15:25:27 INFO Worker: Asked to launch executor app-20140722152527-0001/0 for Spark shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/07/22 15:25:28 INFO ExecutorRunner: Launch command: "java" "-cp" "::/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/conf:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/spark-assembly-1.0.1-hadoop2.3.0-cdh5.0.0.jar:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/datanucleus-api-jdo-3.2.1.jar" "-XX:MaxPermSize=128m" "-Xms1024M" "-Xmx1024M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@hadoop000:50515/user/CoarseGrainedScheduler" "0" "hadoop000" "1" "akka.tcp://sparkWorker@hadoop000:48343/user/Worker" "app-20140722152527-0001"

控制台部分日志信息:

14/07/22 15:25:31 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop000:45150/user/Executor#-791712793] with ID 0
14/07/22 15:25:31 INFO CoarseGrainedExecutorBackend: Successfully registered with driver

每当有新的application注册到master,master都要调度schedule函数将application发送到相应的worker,在对应的worker启动相应的ExecutorBackend,最终的Task就运行在ExecutorBackend中

Spark分析之Standalone运行过程分析的更多相关文章

  1. Spark分析之SparkContext启动过程分析

    SparkContext作为整个Spark的入口,不管是spark.sparkstreaming.spark sql都需要首先创建一个SparkContext对象,然后基于这个SparkContext ...

  2. Spark standalone运行模式

    Spark Standalone 部署配置 Standalone架构 手工启动一个Spark集群 https://spark.apache.org/docs/latest/spark-standalo ...

  3. Task的运行过程分析

    Task的运行过程分析 Task的运行通过Worker启动时生成的Executor实例进行, caseRegisteredExecutor(sparkProperties)=> logInfo( ...

  4. 【Spark Core】任务运行机制和Task源代码浅析1

    引言 上一小节<TaskScheduler源代码与任务提交原理浅析2>介绍了Driver側将Stage进行划分.依据Executor闲置情况分发任务,终于通过DriverActor向exe ...

  5. [大数据从入门到放弃系列教程]第一个spark分析程序

    [大数据从入门到放弃系列教程]第一个spark分析程序 原文链接:http://www.cnblogs.com/blog5277/p/8580007.html 原文作者:博客园--曲高终和寡 **** ...

  6. Spark新手入门——3.Spark集群(standalone模式)安装

    主要包括以下三部分,本文为第三部分: 一. Scala环境准备 查看二. Hadoop集群(伪分布模式)安装 查看三. Spark集群(standalone模式)安装 Spark集群(standalo ...

  7. 五、standalone运行模式

    在上文中我们知道spark的集群主要有三种运行模式standalone.yarn.mesos,其中常被使用的是standalone和yarn,本文了解一下什么是standalone运行模式,它的运行流 ...

  8. Netty3 源代码分析 - NIO server绑定过程分析

    Netty3 源代码分析 - NIO server绑定过程分析      一个框架封装的越好,越利于我们高速的coding.可是却掩盖了非常多的细节和原理.可是源代码可以揭示一切. 服务器端代码在指定 ...

  9. Spark 中 RDD的运行机制

    1. RDD 的设计与运行原理 Spark 的核心是建立在统一的抽象 RDD 之上,基于 RDD 的转换和行动操作使得 Spark 的各个组件可以无缝进行集成,从而在同一个应用程序中完成大数据计算任务 ...

随机推荐

  1. 【Android SDK Manager】SDk国内镜像下载地址

    中国科学院开源协会镜像站地址: IPV4/IPV6: http://mirrors.opencas.cn 端口:80 IPV4/IPV6: http://mirrors.opencas.org 端口: ...

  2. Foundation--结构体

    一,NSRange typedef struct _NSRange { NSUInteger location; NSUInteger length; }NSRange; 这个结构体用来表示事物的一个 ...

  3. 高效开发iOS -- 那些不为人知的KVC[转载]

    valueForKeyPath 本篇来讲解一下那些不为人知,也经常被忽略掉,并且很实用的KVC干货小技巧 获取数组里的,最大.最小.平均.求和 NSArray *array = @[@"1& ...

  4. web测试小结

    今年5月份开始接触web测试,经过大半年的测试及学习,简单总结下 测试过程: 1.需求理解 2.测试策略.方案.用例编写及评审 3.测试环境搭建 4.测试执行 5.bug提单.问题跟踪 6.回归测试 ...

  5. 第三篇 makefile的伪目标

    我们来思考一下makefile中的目标究竟是什么?实际上,在默认情况下:    1.make将makefile的目标认为是一个文件:    2.make解释器比较目标文件和依赖文件的新旧关系,决定是否 ...

  6. tensorflow中常用学习率更新策略

    神经网络训练过程中,根据每batch训练数据前向传播的结果,计算损失函数,再由损失函数根据梯度下降法更新每一个网络参数,在参数更新过程中使用到一个学习率(learning rate),用来定义每次参数 ...

  7. GET_DDL提取建表语句:ddl

    创建对象的语句就是了 提取表 set line 200 pages 50000 wrap on long 999999 serveroutput on SQL> select dbms_meta ...

  8. UVA10590 Boxes of Chocolates Again

    题意 将正整数N拆分成若干个正整数之和,问有多少种不重复的拆分方案. \(n \leq 5000\) 分析 用f(i,j)表示将i拆成若干个数字,最大的那个数字(即最后一个数)不超过j的方案数. 转移 ...

  9. ArrayList和LinkedList插入删除效率的测试(完全不在一个数量级8/20)

    通过index获取元素的值 java里面的链表可以添加索引,而C中的链表,是没有索引的 package ArrayListVSLinkedList; import java.util.ArrayLis ...

  10. oracle 拆分字符串并转换为表

    使用函数的方式 1. 创建的函数以及类型如下: CREATE OR REPLACE TYPE str_split IS TABLE OF VARCHAR2 (4000);CREATE OR REPLA ...