1.通读http://spark.incubator.apache.org/docs/latest/spark-standalone.html

2.在每台机器上将spark安装到/opt/spark

3.在第一台机器上启动spark master.

[root@jfp3-1 latest]# ./sbin/start-master.sh

在logs目录查看日志:

[root@jfp3-1 latest]# tail -100f logs/spark-root-org.apache.spark.deploy.master.Master-1-jfp3-1.out
Spark Command: /usr/java/default/bin/java -cp :/opt/spark/spark-0.9.0-incubating-bin-hadoop2/conf:/opt/spark/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip jfp3-1 --port 7077 --webui-port 8080
========================================

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 04:59:50 INFO Master: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 04:59:50 INFO Master: Starting Spark master at spark://jfp3-1:7077
14/02/21 04:59:51 INFO MasterWebUI: Started Master web UI at http://jfp3-1:8080
14/02/21 04:59:51 INFO Master: I have been elected leader! New state: ALIVE

启动http://jfp3-1:8080上看集群的状况

4.在第2,3,4太机器上启动spark worker

[root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.71:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 05:05:09 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 05:05:09 INFO Worker: Starting Spark worker jfp3-2:53344 with 32 cores, 61.9 GB RAM
14/02/21 05:05:09 INFO Worker: Spark home: /opt/spark/latest
14/02/21 05:05:09 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
14/02/21 05:05:09 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:05:30 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:05:50 INFO Worker: Connecting to master spark://192.168.0.71:7077...
14/02/21 05:06:10 ERROR Worker: All masters are unresponsive! Giving up.

同时在master的日志中也发现错误日志:

14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
]
14/02/21 05:06:23 INFO Master: akka.tcp://sparkWorker@jfp3-3:53721 got disassociated, removing it.
14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
]

用IP连spark master出现问题改用hostname:

[root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/02/21 05:08:41 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/02/21 05:08:41 INFO Worker: Starting Spark worker jfp3-2:60198 with 32 cores, 61.9 GB RAM
14/02/21 05:08:41 INFO Worker: Spark home: /opt/spark/latest
14/02/21 05:08:41 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
14/02/21 05:08:41 INFO Worker: Connecting to master spark://jfp3-1:7077...
14/02/21 05:08:41 INFO Worker: Successfully registered with master spark://jfp3-1:7077

5.在spark master界面上查看集群状态,发现多了3个worker

6. 启动HDFS集群

7.进入spark-shell界面:

[root@jfp3-1 latest]# MASTER=spark://jfp3-1:7077 ./bin/spark-shell

计算HDFS上的一个文件包含2144这个字符的行数

scala> val textFile = sc.textFile("hdfs://192.168.0.71/user/shaochen/apsh/20111201/20111201/44-ABIS-APSH-1G-20111201")
14/02/21 10:16:18 INFO MemoryStore: ensureFreeSpace(146579) called with curMem=0, maxMem=308713881
14/02/21 10:16:18 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 143.1 KB, free 294.3 MB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> val targetRows = textFile.filter(line => line.contains("2144"))
targetRows: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at <console>:14

scala> targetRows.count()
14/02/21 10:18:27 INFO FileInputFormat: Total input paths to process : 1
14/02/21 10:18:27 INFO SparkContext: Starting job: count at <console>:17
14/02/21 10:18:27 INFO DAGScheduler: Got job 0 (count at <console>:17) with 11 output partitions (allowLocal=false)
14/02/21 10:18:27 INFO DAGScheduler: Final stage: Stage 0 (count at <console>:17)
14/02/21 10:18:27 INFO DAGScheduler: Parents of final stage: List()
14/02/21 10:18:27 INFO DAGScheduler: Missing parents: List()
14/02/21 10:18:27 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at <console>:14), which has no missing parents
14/02/21 10:18:27 INFO DAGScheduler: Submitting 11 missing tasks from Stage 0 (FilteredRDD[2] at filter at <console>:14)
14/02/21 10:18:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 11 tasks
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:0 as 1716 bytes in 5 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:1 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:2 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:3 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:4 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:5 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:6 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:7 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor 0: jfp3-4 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:8 as 1716 bytes in 0 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor 2: jfp3-3 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:9 as 1716 bytes in 1 ms
14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:10 as TID 10 on executor 1: jfp3-2 (NODE_LOCAL)
14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:10 as 1716 bytes in 1 ms
14/02/21 10:18:30 INFO TaskSetManager: Finished TID 10 in 2850 ms on jfp3-2 (progress: 0/11)
14/02/21 10:18:30 INFO DAGScheduler: Completed ResultTask(0, 10)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 5 in 3188 ms on jfp3-4 (progress: 1/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 5)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 8 in 3188 ms on jfp3-4 (progress: 2/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 8)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 1 in 3237 ms on jfp3-2 (progress: 3/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 1)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 7 in 3234 ms on jfp3-2 (progress: 4/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 7)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 2 in 3269 ms on jfp3-4 (progress: 5/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 2)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 9 in 3300 ms on jfp3-3 (progress: 6/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 9)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 4 in 3362 ms on jfp3-2 (progress: 7/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 4)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 3 in 3423 ms on jfp3-3 (progress: 8/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 3)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 6 in 3439 ms on jfp3-3 (progress: 9/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 6)
14/02/21 10:18:31 INFO TaskSetManager: Finished TID 0 in 3458 ms on jfp3-3 (progress: 10/11)
14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 0)
14/02/21 10:18:31 INFO TaskSchedulerImpl: Remove TaskSet 0.0 from pool
14/02/21 10:18:31 INFO DAGScheduler: Stage 0 (count at <console>:17) finished in 3.466 s
14/02/21 10:18:31 INFO SparkContext: Job finished: count at <console>:17, took 3.593541623 s
res0: Long = 12129

附录:

命令脚本集合:

启动master:

/opt/spark/latest/sbin/start-master.sh

启动worker:

/opt/spark/latest/bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077

在standalone模式下运行yarn 0.9.0对HDFS上的数据进行计算的更多相关文章

  1. OLE DB访问接口“MICROSOFT.JET.OLEDB.4.0”配置为在单线程单位模式下运行,所以该访问接口无法用于分布式

    OLE DB访问接口"MICROSOFT.JET.OLEDB.4.0"配置为在单线程单位模式下运行,所以该访问接口无法用于分布式 数据库操作excel时遇到的以上问题的解决方法 解 ...

  2. MySQL-Front 出现“程序注册时间到期 程序将被限制模式下运行”解决方式

    MySQL-Front 出现“程序注册时间到期 程序将被限制模式下运行”解决方式 在用mysql-front的时候遇到显示:程序注册时间到期程序将被限制模式下运行.可以在“帮助”菜单下的点“登记”-- ...

  3. [Selenium]Grid模式下运行时打印出当前Case在哪台node机器上运行

    当Case在本地运行成功,在Grid模式下运行失败时,我们需要在Grid模式下进行调试,同时登录远程的node去查看运行的情况. Hub是随机将case分配到某台node上运行的,怎样知道当前的cas ...

  4. 非GUI模式下运行JMeter和远程启动JMeter

    JMeter是一款非常不错的免费开源压力测试工具,越来越多的公司在使用.不过,在使用过程中可能会存在一些问题,比如:GUI模式非常消耗资源,单个客户端测试无法达到目标压力.而使用非 GUI 模式,即命 ...

  5. 教你50招提升ASP.NET性能(十一):避免在调试模式下运行网站

    (17)Avoid running sites in debug mode 招数17: 避免在调试模式下运行网站 When it comes to ASP.NET, one of the most c ...

  6. 关于spark standalone模式下的executor问题

    1.spark standalone模式下,worker与executor是一一对应的. 2.如果想要多个worker,那么需要修改spark-env的SPARK_WORKER_INSTANCES为2 ...

  7. C++程序在debug模式下遇到Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call问题。

    今天遇到一个Access Violation的crash,只看crash call stack没有找到更多的线索,于是在debug模式下又跑了一遍,遇到了如下的一个debug的错误提示框: 这个是什么 ...

  8. Standalone模式下,通过Systemd管理Flink1.11.1的启停及异常退出

    Flink以Standalone模式运行时,可能会发生jobmanager(以下简称jm)或taskmanager(以下简称tm)异常退出的情况,我们可以使用Linux自带的Systemd方式管理jm ...

  9. 在debug模式下运行不报错,换到release模式下报找不到某某库或文件的错。。解决办法

    我遇到的问题是:把edit secheme调到debug模式运行没有问题,然后调到release模式的时候报目录下没有libTuyoo.a 解决办法 把断开真机设备,用IOS device下relea ...

随机推荐

  1. 最近在学习bootstrap的时候用bootstrap的视频教程2.0的引用bootstrap3.0突然发现很多不同,总结了一下

    bootstrap 2.3版与3.0版重要类的改变对比 Bootstrap 2.x Bootstrap 3.0 .container-fluid .container .row-fluid .row ...

  2. 报表控件NCReport教程:集成NCReport到Qt应用程序中

    NCReport是一款轻量级.快速.多平台.简单易用的基于Qt toolkit的C++编写的报表解决方案,目前主要包括报表渲染库和报表设计器GUI应用程序. 但是好多使用NCReport控件的朋友都不 ...

  3. ubuntu 在mac 的 Parallels 的分辨率问题

    安装 ubuntu系统,刚开始安装成功的时候分辨率只有800*600. 设置里面只有800*600一个选项. http://linuxbsdos.com/2014/10/31/solutions-fo ...

  4. java 将list 按长度分割

    public static <T> List<List<T>> splitList(List<T> list, int pageSize) {      ...

  5. Worker Thread

    http://www.codeproject.com/Articles/552/Using-Worker-Threads Introduction Worker threads are an eleg ...

  6. Oracle求连续的年份

    SELECT ('2013') + ROWNUM year FROM dualCONNECT BY ROWNUM <= 2;

  7. eclipse https git

    open preferences via application menu Window => Preferences (or on OSX Eclipse => Settings). N ...

  8. 来自苹果的编程语言——Swift简介转载】

    关于 这篇文章简要介绍了苹果于WWDC 2014发布的编程语言——Swift. 原文作者: Lucida Blog 新浪微博 豆瓣 转载前请保留出处链接,谢谢. 前言 在这里我认为有必要提一下Brec ...

  9. LTE Module User Documentation(翻译9)——Using the EPC with emulation mode

    LTE用户文档 (如有不当的地方,欢迎指正!) 15 Using the EPC with emulation mode(使用仿真方式的 EPC)     在上一节中,我们使用点对点链路连接基站和服务 ...

  10. Hibernate <二级缓存>

    二级缓存: 定义: 1.二级缓存被称为进程级缓存或者sessionFactory级缓存,二级缓存可以被所有session共享 2.二级缓存的生命周期和sessionFactory生命周期一样(sess ...