1.修改拷贝/root/spark-1.5.1-bin-hadoop2.6/conf下面spark-env.sh.template到spark-env.sh,并添加设置HADOOP_CONF_DIR:

# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
export HADOOP_CONF_DIR=/etc/hadoop/conf

2.运行测试程序

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores \
--queue thequeue \
lib/spark-examples*.jar \

在运行时发现root用户没有hdfs目录/user/的写权限,导致任务失败:

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:)

修改/user目录的权限即可:

[root@node1 spark-1.5.-bin-hadoop2.]# sudo -u hdfs hdfs dfs -chmod  /user

重新运行:

[root@node1 spark-1.5.-bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster     --num-executors      --driver-memory 4g     --executor-memory 2g     --executor-cores      --queue thequeue     lib/spark-examples*.jar
// :: INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.0.81:
// :: INFO yarn.Client: Requesting a new application from cluster with NodeManagers
// :: INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container)
// :: INFO yarn.Client: Will allocate AM container, with MB memory including MB overhead
// :: INFO yarn.Client: Setting up container launch context for our AM
// :: INFO yarn.Client: Setting up the launch environment for our AM container
// :: INFO yarn.Client: Preparing resources for our AM container
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO yarn.Client: Uploading resource file:/root/spark-1.5.-bin-hadoop2./lib/spark-assembly-1.5.-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-assembly-1.5.1-hadoop2.6.0.jar
// :: INFO yarn.Client: Uploading resource file:/root/spark-1.5.-bin-hadoop2./lib/spark-examples-1.5.-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-examples-1.5.1-hadoop2.6.0.jar
// :: INFO yarn.Client: Uploading resource file:/tmp/spark-72a1a44a-029c--acd1-6fbd44f5709a/__spark_conf__2902795872463320162.zip -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/__spark_conf__2902795872463320162.zip
// :: INFO spark.SecurityManager: Changing view acls to: root
// :: INFO spark.SecurityManager: Changing modify acls to: root
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
// :: INFO yarn.Client: Submitting application to ResourceManager
// :: INFO impl.YarnClientImpl: Submitted application application_1446368481906_0008
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -
queue: root.thequeue
start time:
final status: UNDEFINED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/
user: root
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.0.83
ApplicationMaster RPC port:
queue: root.thequeue
start time:
final status: UNDEFINED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/
user: root
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: FINISHED)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.0.83
ApplicationMaster RPC port:
queue: root.thequeue
start time:
final status: SUCCEEDED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/A
user: root
// :: INFO util.ShutdownHookManager: Shutdown hook called
// :: INFO util.ShutdownHookManager: Deleting directory /tmp/spark-72a1a44a-029c--acd1-6fbd44f5709a
[root@node1 spark-1.5.-bin-hadoop2.]#

3.使用spark-sql

将/etc/hive/conf/hive-site.xml拷贝到/root/spark-1.5.1-bin-hadoop2.6/conf下,启动spark-sql即可

[root@node1 spark-1.5.-bin-hadoop2.]# cp /etc/hive/conf/hive-site.xml conf/
[root@node1 spark-1.5.-bin-hadoop2.]# ./bin/spark-sql

Spark-1.5.1 on CDH-5.4.7的更多相关文章

  1. spark on yarn 资源调度(cdh为例)

    一.CPU配置: ApplicationMaster 虚拟 CPU内核 yarn.app.mapreduce.am.resource.cpu-vcores ApplicationMaster占用的cp ...

  2. 【Spark】必须要用CDH版本的Spark?那你是不是需要重新编译?

    目录 为什么要重新编译? 步骤 一.下载Spark的源码 二.准备linux环境,安装必须软件 三.解压spark源码,修改配置,准备编译 四.开始编译 为什么要重新编译? 由于我们所有的环境统一使用 ...

  3. 1、Spark 2.1 源码编译支持CDH

    目前CDH支持的spark版本都是1.x, 如果想要使用spark 2x的版本, 只能编译spark源码生成支持CDH的版本. 一.准备工作 找一台Linux主机, 由于spark源码编译会下载很多的 ...

  4. Why Apache Spark is a Crossover Hit for Data Scientists [FWD]

    Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...

  5. 转:Sharethrough使用Spark Streaming优化实时竞价

    文章来自于:http://www.infoq.com/cn/news/2014/04/spark-streaming-bidding 来自于Sharethrough的数据基础设施工程师Russell ...

  6. CDH集群安装&测试总结

    0.绪论 之前完全没有接触过大数据相关的东西,都是书上啊,媒体上各种吹嘘啊,我对大数据,集群啊,分布式计算等等概念真是高山仰止,充满了仰望之情,觉得这些东西是这样的: 当我搭建的过程中,发现这些东西是 ...

  7. hive on spark

    hive on spark 的配置及设置CDH都已配置好,直接使用就行,但是我在用的时候报错,如下: 具体操作如下时报的错:      在hive 里执行以下命令:     set hive.exec ...

  8. hive使用spark引擎的几种情况

    使用spark引擎查询hive有以下几种方式:1>使用spark-sql(spark sql cli)2>使用spark-thrift提交查询sql3>使用hive on spark ...

  9. CDH集群spark-shell执行过程分析

    目的 刚入门spark,安装的是CDH的版本,版本号spark-core_2.11-2.4.0-cdh6.2.1,部署了cdh客户端(非集群节点),本文主要以spark-shell为例子,对在cdh客 ...

  10. 部署开启了Kerberos身份验证的大数据平台集群外客户端

    转载请注明出处 :http://www.cnblogs.com/xiaodf/ 本文档主要用于说明,如何在集群外节点上,部署大数据平台的客户端,此大数据平台已经开启了Kerberos身份验证.通过客户 ...

随机推荐

  1. 中国天气网放回json的解释

    本文是出自David_Tang的,原文http://www.cnblogs.com/mchina/archive/2013/07/12/3170551.html {"weatherinfo& ...

  2. 使用vs2010创建、发布、部署、调用 WebService

    原文地址:使用vs2010创建.发布.部署.调用 WebService作者:吴超 一 使用vs2010创建 WebService 1 打开VS2010,菜单    文件->新建->项目2 ...

  3. java中Array/List/Map/Object与Json互相转换详解

    http://blog.csdn.net/xiaomu709421487/article/details/51456705 JSON(JavaScript Object Notation): 是一种轻 ...

  4. 创建ajax异步对象方法

    ajax的xmlhttp对象创建: function createXmlHttp(){ var xmlHttp; try{ // Firefox, Opera 8.0+, Safari xmlHttp ...

  5. python学习--字符串

    python的字符串类型为str 定义字符串可以用 ‘abc' , "abc", '''abc''' 查看str的帮助 在python提示符里 help(str) python基于 ...

  6. 【crawler】log4j:WARN No appenders could be found for logger (dao.hsqlmanager).

    This Short introduction to log4j guide is a little bit old but still valid. That guide will give you ...

  7. text-decoration

    2016-08-18  text-decoration  blink貌似在firefox里也不起作用? <p style="color:red;text-decoration:unde ...

  8. simpson法求积分 专题练习

    [xsy1775]数值积分 题意 多组询问,求\(\int_l^r\sqrt{a(1-{x^2\over b})}dx\) 分析 double f(double x) { return sqrt(a* ...

  9. SPSS数据分析—信度分析

    测量最常用的是使用问卷调查.信度分析主要就是分析问卷测量结果的稳定性,如果多次重复测量的结果都很接近,就可以认为测量的信度是高的.与信度相对应的概念是效度,效度是指测量值和真实值的接近程度.二者的区别 ...

  10. 改善C#程序,提高程序运行效率的50种方法

    改善C#程序,提高程序运行效率的50种方法   转自:http://blog.sina.com.cn/s/blog_6f7a7fb501017p8a.html 一.用属性代替可访问的字段 1..NET ...