捣鼓了一下,先来个手动挡吧。自动挡要设置ssh无密码登陆啥的,后面开搞。

一、手动多台机链接master

手动链接master其实上篇已经用过。

这里有两台机器:

10.60.215.41 启动master、worker1、application(spark shell)

10.0.2.15 启动worker2

具体步骤如下:

1.在10.60.215.41 上

$SPARK_HOME $ ./sbin/start-master.sh 
$SPARK_HOME $./bin/spark-class org.apache.spark.deploy.worker.Worker spark://qpzhangdeMac-mini.local:7077

2.在10.0.2.15上

$SPARK_HOME $./bin/spark-class org.apache.spark.deploy.worker.Worker spark://qpzhangdeMac-mini.local:7077

这里需要注意的是,貌似spark用了akka的库,spark的master URL里面必须要用hostname(尝试从配置文件里面改成IP,也没生效),否则会报错:

15/03/20 17:14:05 ERROR EndpointWriter: dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://sparkMaster@10.60.215.41:7077/]] arriving at [akka.tcp://sparkMaster@10.60.215.41:7077] inbound addresses are [akka.tcp://sparkMaster@qpzhangdeMac-mini.local:7077]

要在10.0.2.15机器的hosts里面,设置qpzhangdeMac-mini.local对应的IP为master 10.60.215.41,否则无法转换成IP进行链接。

开始以为把master kill之后,master会自动转为worker1 或者 work2中的一个,但是并没有。worker只是不断尝试重连。

15/03/20 17:41:05 INFO Worker: Retrying connection to master (attempt # 2)
15/03/20 17:41:05 WARN EndpointWriter: AssociationError [akka.tcp://sparkWorker@10.60.215.41:53899] -> [akka.tcp://sparkMaster@qpzhangdeMac-mini.local:7077]: Error [Invalid address: akka.tcp://sparkMaster@qpzhangdeMac-mini.local:7077] [
akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@qpzhangdeMac-mini.local:7077
Caused by: akka.remote.transport.Transport$InvalidAssociationException: Connection refused: qpzhangdeMac-mini.local/10.60.215.41:7077

重新启动master之后, 重连成功。

15/03/20 18:27:41 INFO Worker: Retrying connection to master (attempt # 10)
15/03/20 18:27:41 INFO Worker: Successfully registered with master spark://qpzhangdeMac-mini.local:7077

这里暂且留下几个疑问:

1)原来salve只是workers 么?worker是不会升级为master的,这里没有选举之说。

2)master挂了之后,重启,任务会丢失么?

3)单个worker是否可以注册到多个master上?

3.在10.60.215.41 上

启动spark shell,下达任务。

scala> val textFile = sc.textFile("/var/spark/README.md")
15/03/20 17:55:41 INFO MemoryStore: ensureFreeSpace(73391) called with curMem=186365, maxMem=555755765
15/03/20 17:55:41 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 71.7 KB, free 529.8 MB)
15/03/20 17:55:41 INFO MemoryStore: ensureFreeSpace(31262) called with curMem=259756, maxMem=555755765
15/03/20 17:55:41 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 30.5 KB, free 529.7 MB)
15/03/20 17:55:41 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.60.215.41:53983 (size: 30.5 KB, free: 530.0 MB)
15/03/20 17:55:41 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/20 17:55:41 INFO SparkContext: Created broadcast 2 from textFile at <console>:21
textFile: org.apache.spark.rdd.RDD[String] = /var/spark/README.md MapPartitionsRDD[3] at textFile at <console>:21 scala> textFile.count()
15/03/20 17:55:45 INFO FileInputFormat: Total input paths to process : 1
15/03/20 17:55:45 INFO SparkContext: Starting job: count at <console>:24
15/03/20 17:55:45 INFO DAGScheduler: Got job 1 (count at <console>:24) with 2 output partitions (allowLocal=false)
15/03/20 17:55:45 INFO DAGScheduler: Final stage: Stage 1(count at <console>:24)
15/03/20 17:55:45 INFO DAGScheduler: Parents of final stage: List()
15/03/20 17:55:45 INFO DAGScheduler: Missing parents: List()
15/03/20 17:55:45 INFO DAGScheduler: Submitting Stage 1 (/var/spark/README.md MapPartitionsRDD[3] at textFile at <console>:21), which has no missing parents
15/03/20 17:55:45 INFO MemoryStore: ensureFreeSpace(2640) called with curMem=291018, maxMem=555755765
15/03/20 17:55:45 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.6 KB, free 529.7 MB)
15/03/20 17:55:45 INFO MemoryStore: ensureFreeSpace(1931) called with curMem=293658, maxMem=555755765
15/03/20 17:55:45 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1931.0 B, free 529.7 MB)
15/03/20 17:55:45 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.60.215.41:53983 (size: 1931.0 B, free: 530.0 MB)
15/03/20 17:55:45 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/03/20 17:55:45 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:839
15/03/20 17:55:45 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (/var/spark/README.md MapPartitionsRDD[3] at textFile at <console>:21)
15/03/20 17:55:45 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/03/20 17:55:45 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 3, 10.60.215.41, PROCESS_LOCAL, 1289 bytes)
15/03/20 17:55:45 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 4, 10.0.2.15, PROCESS_LOCAL, 1289 bytes)
15/03/20 17:55:45 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.60.215.41:53990 (size: 1931.0 B, free: 265.1 MB)
15/03/20 17:55:45 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.60.215.41:53990 (size: 30.5 KB, free: 265.1 MB)
15/03/20 17:55:45 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.0.2.15:53284 (size: 1931.0 B, free: 267.2 MB)
15/03/20 17:55:45 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.0.2.15:53284 (size: 30.5 KB, free: 267.2 MB)
15/03/20 17:55:45 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 3) in 127 ms on 10.60.215.41 (1/2)
15/03/20 17:55:46 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 4) in 470 ms on 10.0.2.15 (2/2)
15/03/20 17:55:46 INFO DAGScheduler: Stage 1 (count at <console>:24) finished in 0.471 s
15/03/20 17:55:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/03/20 17:55:46 INFO DAGScheduler: Job 1 finished: count at <console>:24, took 0.487544 s
res2: Long = 98 scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at <console>:23 scala> linesWithSpark.count()
15/03/20 17:56:53 INFO SparkContext: Starting job: count at <console>:26
15/03/20 17:56:53 INFO DAGScheduler: Got job 2 (count at <console>:26) with 2 output partitions (allowLocal=false)
15/03/20 17:56:53 INFO DAGScheduler: Final stage: Stage 2(count at <console>:26)
15/03/20 17:56:53 INFO DAGScheduler: Parents of final stage: List()
15/03/20 17:56:53 INFO DAGScheduler: Missing parents: List()
15/03/20 17:56:53 INFO DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[4] at filter at <console>:23), which has no missing parents
15/03/20 17:56:53 INFO MemoryStore: ensureFreeSpace(2848) called with curMem=295589, maxMem=555755765
15/03/20 17:56:53 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 2.8 KB, free 529.7 MB)
15/03/20 17:56:53 INFO MemoryStore: ensureFreeSpace(2034) called with curMem=298437, maxMem=555755765
15/03/20 17:56:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2034.0 B, free 529.7 MB)
15/03/20 17:56:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.60.215.41:53983 (size: 2034.0 B, free: 530.0 MB)
15/03/20 17:56:53 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/03/20 17:56:53 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:839
15/03/20 17:56:53 INFO DAGScheduler: Submitting 2 missing tasks from Stage 2 (MapPartitionsRDD[4] at filter at <console>:23)
15/03/20 17:56:53 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
15/03/20 17:56:53 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 5, 10.0.2.15, PROCESS_LOCAL, 1289 bytes)
15/03/20 17:56:53 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 6, 10.60.215.41, PROCESS_LOCAL, 1289 bytes)
15/03/20 17:56:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.60.215.41:53990 (size: 2034.0 B, free: 265.1 MB)
15/03/20 17:56:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.0.2.15:53284 (size: 2034.0 B, free: 267.2 MB)
15/03/20 17:56:53 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 6) in 113 ms on 10.60.215.41 (1/2)
15/03/20 17:56:53 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 5) in 122 ms on 10.0.2.15 (2/2)
15/03/20 17:56:53 INFO DAGScheduler: Stage 2 (count at <console>:26) finished in 0.122 s
15/03/20 17:56:53 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
15/03/20 17:56:53 INFO DAGScheduler: Job 2 finished: count at <console>:26, took 0.137589 s
res3: Long = 19

从日志里面看到,任务都是分解成2个,分别发送到2个worker上面执行。

这里不免想到以下问题:

1)master的任务是怎么分配的?local file 是传递path到不同的worker上去,还是把内容读取了传递过去?

2)如果仅仅是传递path过去,那么每个work都要读一遍文件?全部读取,还是移位读取的呢?

多执行几次,然后看worker的日志,发现是传path,加上文件分片的;不同的分片应该是随机分到对应的worker的,因为几次命令,每个worker收到的分片地址不一样。

这里还有一个问题,如果是从HDFS上面读取文件,一个地址是可以被不同机器的worker读取到的。如果是读本地local path的话,那么就呵呵了,你要自己把文件内容分派到不同的worker机器上去了。

可在 http://10.60.215.41:4040/executors/ 上面可以看到当前执行task的 workers list,以及task被执行的状态。

二,自动挡部署

==========

其实原理也很简单,就是shell脚本,根据配置的slavers机器,通过ssh登录到slaver机器上面,切换到对应的目录,启动slave。

相比手动启动slaver,这个一键启动只需要在一台master机器上完成。

前提是,你必须配置好ssh的无密码登录,你可以参考这里

配置好后,修改conf目录下的slavers列表:

root@qp-zhang:/var/spark# cat conf/slaves
# A Spark Worker will be started on each of the machines listed below.
localhost
root@qpzhangdeMac-mini.local

采用对应的slavers脚本启动即可:

root@qp-zhang:/var/spark# ./sbin/start-slaves.sh
root@qpzhangdeMac-mini.local: starting org.apache.spark.deploy.worker.Worker, logging to /private/var/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-qpzhangdeMac-mini.local.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /var/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-qp-zhang.out

这时,可以通过

http://localhost:8080/  查看当前master的slavers(也可以说是workers)。

===================================

转载请注明出处:http://www.cnblogs.com/zhangqingping/p/4354383.html

Spark Standalone Mode 多机启动 -- 分布式计算系统spark学习(二)(更新一键启动slavers)的更多相关文章

  1. 让spark运行在mesos上 -- 分布式计算系统spark学习(五)

    mesos集群部署参见上篇. 运行在mesos上面和 spark standalone模式的区别是: 1)stand alone 需要自己启动spark master 需要自己启动spark slav ...

  2. 提交任务到spark master -- 分布式计算系统spark学习(四)

    部署暂时先用默认配置,我们来看看如何提交计算程序到spark上面. 拿官方的Python的测试程序搞一下. qpzhang@qpzhangdeMac-mini:~/project/spark-1.3. ...

  3. 系统架构--分布式计算系统spark学习(三)

    通过搭建和运行example,我们初步认识了spark. 大概是这么一个流程 ------------------------------                 -------------- ...

  4. .net core 源码解析-web app是如何启动并接收处理请求(二) kestrel的启动

    上篇讲到.net core web app是如何启动并接受请求的,下面接着探索kestrel server是如何完成此任务的. 1.kestrel server的入口KestrelServer.Sta ...

  5. Spark Standalone Mode 单机启动Spark -- 分布式计算系统spark学习(一)

    spark是个啥? Spark是一个通用的并行计算框架,由UCBerkeley的AMP实验室开发. Spark和Hadoop有什么不同呢? Spark是基于map reduce算法实现的分布式计算,拥 ...

  6. 黑马tomact学习二 tomcat的启动

  7. Spark:一个高效的分布式计算系统

    概述 什么是Spark ◆ Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架,Spark基于map reduce算法实现的分布式计算,拥 ...

  8. Spark系列之二——一个高效的分布式计算系统

    1.什么是Spark? Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架,Spark基于map reduce算法实现的分布式计算,拥有H ...

  9. 【转】Spark:一个高效的分布式计算系统

    原文地址:http://tech.uc.cn/?p=2116 概述 什么是Spark Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架, ...

随机推荐

  1. 跟着百度学PHP[10]-读取COOKIE案例

    <?php if(!isset($_COOKIE['visittime'])){ #使用$_COOKIE获取visittime,如果不存在就执行下面的语句块,否则执行else setcookie ...

  2. lua工具库penlight--01简介

    lua的设计目标是嵌入式语言,所以和其它动态语言(如python.ruby)相比其自带的库缺少很多实用功能. 好在有lua社区有Penlight,为lua提供了许多强大的功能,接下来的几篇博客,我会简 ...

  3. 存档格式选择--JSON

    游戏里存档可以直接用lua,但是lua需要有一定编程基础:另外可以用ini,不过ini又太简单了,复杂的 格式无法用ini描述:还可以用xml,它的表达能力非常丰富,甚至有限数据库都用xml来作存储结 ...

  4. 未能加载文件或程序集“Autofac, Version=3.4.0.0,

    遇到这个错误的时候:如下图 未能加载文件或程序集“Autofac, Version=3.4.0.0, Culture=neutral, PublicKeyToken=17863af14b0044da” ...

  5. C#如何调用其他.config配置文件,就是2个乃至3个以上的config文件

    XmlDocument xDoc = new XmlDocument(); try { xDoc.Load(配置文件路径); XmlNode xNode; XmlElement xElem; xNod ...

  6. Unix系统编程()信号:概念和概述

    这篇将一口气学完信号的基本概念,但是有很多的细节,所以篇幅较长,请做好心理准备. (他大爷的,一口气没有学完,太懒了) 有以下主题: 各种不同信号及其用途 内核可能为进程产生信号的环境,以及某一进程向 ...

  7. 常见linux内核线程说明

    ps进程名有方括号的是内核级的进程,执行辅助功能(比如将缓存写入到磁盘):所有其他进程都是使用者进程.您会注意到,就算是在您新安装的(最小化的)系统中,也会有很多进程在运行. 在文档kernel-pe ...

  8. 【BZOJ】1044: [HAOI2008]木棍分割(二分+dp)

    http://www.lydsy.com/JudgeOnline/problem.php?id=1044 如果只求最大的最小,,直接二分就行了...可是要求方案.. 好神! 我竟然想不到! 因为我们得 ...

  9. 有用的Python代码片段

    我列出的这些有用的Python代码片段,为我节省了大量的时间,并且我希望他们也能为你节省一些时间.大多数的这些片段出自寻找解决方案,查找博客和StackOverflow解决类似问题的答案.下面所有的代 ...

  10. 带清空按钮TextBox的实现(WPF)

    本博文针对人群:WPF新手.博文内容:通过Style制定包含清空Button的TextBox样式模板,通过在Style中引入自定义类的附加属性完成对TextBox的内容清空. <span sty ...