Spark-1.5.1 on CDH-5.4.7
1.修改拷贝/root/spark-1.5.1-bin-hadoop2.6/conf下面spark-env.sh.template到spark-env.sh,并添加设置HADOOP_CONF_DIR:
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
export HADOOP_CONF_DIR=/etc/hadoop/conf
2.运行测试程序
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores \
--queue thequeue \
lib/spark-examples*.jar \
在运行时发现root用户没有hdfs目录/user/的写权限,导致任务失败:
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:)
修改/user目录的权限即可:
[root@node1 spark-1.5.-bin-hadoop2.]# sudo -u hdfs hdfs dfs -chmod /user
重新运行:
[root@node1 spark-1.5.-bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors --driver-memory 4g --executor-memory 2g --executor-cores --queue thequeue lib/spark-examples*.jar
// :: INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.0.81:
// :: INFO yarn.Client: Requesting a new application from cluster with NodeManagers
// :: INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container)
// :: INFO yarn.Client: Will allocate AM container, with MB memory including MB overhead
// :: INFO yarn.Client: Setting up container launch context for our AM
// :: INFO yarn.Client: Setting up the launch environment for our AM container
// :: INFO yarn.Client: Preparing resources for our AM container
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO yarn.Client: Uploading resource file:/root/spark-1.5.-bin-hadoop2./lib/spark-assembly-1.5.-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-assembly-1.5.1-hadoop2.6.0.jar
// :: INFO yarn.Client: Uploading resource file:/root/spark-1.5.-bin-hadoop2./lib/spark-examples-1.5.-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-examples-1.5.1-hadoop2.6.0.jar
// :: INFO yarn.Client: Uploading resource file:/tmp/spark-72a1a44a-029c--acd1-6fbd44f5709a/__spark_conf__2902795872463320162.zip -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/__spark_conf__2902795872463320162.zip
// :: INFO spark.SecurityManager: Changing view acls to: root
// :: INFO spark.SecurityManager: Changing modify acls to: root
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
// :: INFO yarn.Client: Submitting application to ResourceManager
// :: INFO impl.YarnClientImpl: Submitted application application_1446368481906_0008
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -
queue: root.thequeue
start time:
final status: UNDEFINED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/
user: root
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.0.83
ApplicationMaster RPC port:
queue: root.thequeue
start time:
final status: UNDEFINED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/
user: root
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: FINISHED)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.0.83
ApplicationMaster RPC port:
queue: root.thequeue
start time:
final status: SUCCEEDED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/A
user: root
// :: INFO util.ShutdownHookManager: Shutdown hook called
// :: INFO util.ShutdownHookManager: Deleting directory /tmp/spark-72a1a44a-029c--acd1-6fbd44f5709a
[root@node1 spark-1.5.-bin-hadoop2.]#
3.使用spark-sql
将/etc/hive/conf/hive-site.xml拷贝到/root/spark-1.5.1-bin-hadoop2.6/conf下,启动spark-sql即可
[root@node1 spark-1.5.-bin-hadoop2.]# cp /etc/hive/conf/hive-site.xml conf/
[root@node1 spark-1.5.-bin-hadoop2.]# ./bin/spark-sql
Spark-1.5.1 on CDH-5.4.7的更多相关文章
- spark on yarn 资源调度(cdh为例)
一.CPU配置: ApplicationMaster 虚拟 CPU内核 yarn.app.mapreduce.am.resource.cpu-vcores ApplicationMaster占用的cp ...
- 【Spark】必须要用CDH版本的Spark?那你是不是需要重新编译?
目录 为什么要重新编译? 步骤 一.下载Spark的源码 二.准备linux环境,安装必须软件 三.解压spark源码,修改配置,准备编译 四.开始编译 为什么要重新编译? 由于我们所有的环境统一使用 ...
- 1、Spark 2.1 源码编译支持CDH
目前CDH支持的spark版本都是1.x, 如果想要使用spark 2x的版本, 只能编译spark源码生成支持CDH的版本. 一.准备工作 找一台Linux主机, 由于spark源码编译会下载很多的 ...
- Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...
- 转:Sharethrough使用Spark Streaming优化实时竞价
文章来自于:http://www.infoq.com/cn/news/2014/04/spark-streaming-bidding 来自于Sharethrough的数据基础设施工程师Russell ...
- CDH集群安装&测试总结
0.绪论 之前完全没有接触过大数据相关的东西,都是书上啊,媒体上各种吹嘘啊,我对大数据,集群啊,分布式计算等等概念真是高山仰止,充满了仰望之情,觉得这些东西是这样的: 当我搭建的过程中,发现这些东西是 ...
- hive on spark
hive on spark 的配置及设置CDH都已配置好,直接使用就行,但是我在用的时候报错,如下: 具体操作如下时报的错: 在hive 里执行以下命令: set hive.exec ...
- hive使用spark引擎的几种情况
使用spark引擎查询hive有以下几种方式:1>使用spark-sql(spark sql cli)2>使用spark-thrift提交查询sql3>使用hive on spark ...
- CDH集群spark-shell执行过程分析
目的 刚入门spark,安装的是CDH的版本,版本号spark-core_2.11-2.4.0-cdh6.2.1,部署了cdh客户端(非集群节点),本文主要以spark-shell为例子,对在cdh客 ...
- 部署开启了Kerberos身份验证的大数据平台集群外客户端
转载请注明出处 :http://www.cnblogs.com/xiaodf/ 本文档主要用于说明,如何在集群外节点上,部署大数据平台的客户端,此大数据平台已经开启了Kerberos身份验证.通过客户 ...
随机推荐
- requirejs加载css样式表
1. 在 https://github.com/guybedford/require-css 下载到require-css包 2. 把css.js或者css.min.js复制到项目的js目录下 3. ...
- Codeforces 722C. Destroying Array
C. Destroying Array time limit per test 1 second memory limit per test 256 megabytes input standard ...
- pyqt4学习笔记
信号与槽机制 信号与槽机制作为Qt最重要的特性,提供了任意两个Qt对象之间的通信机制.其中,信号会在某个特定情况或动作下被触发,槽是用于接收并处理信号的函数.例如,要将一个窗口中的变化情况通知给另一个 ...
- RealSense开发-Session和SenseManager的几种创建方法
从Intel RealSense 的SDK文档对其架构(如图1所示)的始描述可知,Session是SDK应用的主控模块,必须在所有模块操作之前创建,并且在所有模块注销后最后注销.SenseManage ...
- RDLC 子报表
1.RDLC 设计页面,拖入table或者矩形 2.右击表格或者矩形单元格,插入--子报表 3.输入名称和将此报表用作子报表 名称:显示在设计页面上的,仅作观看作用 将此报表用作子报表:填写目录下的需 ...
- bootstrap-table填坑之旅<一>认识bootstrap-table
应公司需求,改版公司ERP的数据显示样式.由于前期开发的样式是bootstrap,所以选bootstrap-table理所当然(也是因为看了bootstrap-table官网的example功能强大, ...
- qt开源社区学习
http://bbs.qter.org/forum.php?mod=forumdisplay&fid=52
- Linux SHELL,环境变量
SHELL: 在计算机科学中,Shell俗称壳(用来区别于核),是指"提供使用者使用界面"的软件(命令解析器).它类似于DOS下的command和后来的cmd.exe.它接收用户命 ...
- appium案例
import unittest from time import sleep from appium import webdriver import desired_capabilities clas ...
- 用C++画心(转)
原地址https://www.zhihu.com/topic/19613730/top-answers 首先上一个动态的心 代码如下: #include <stdio.h> #includ ...