1.修改拷贝/root/spark-1.5.1-bin-hadoop2.6/conf下面spark-env.sh.template到spark-env.sh,并添加设置HADOOP_CONF_DIR:

# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
export HADOOP_CONF_DIR=/etc/hadoop/conf

2.运行测试程序

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores \
--queue thequeue \
lib/spark-examples*.jar \

在运行时发现root用户没有hdfs目录/user/的写权限,导致任务失败:

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:)

修改/user目录的权限即可:

[root@node1 spark-1.5.-bin-hadoop2.]# sudo -u hdfs hdfs dfs -chmod  /user

重新运行:

[root@node1 spark-1.5.-bin-hadoop2.]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn-cluster     --num-executors      --driver-memory 4g     --executor-memory 2g     --executor-cores      --queue thequeue     lib/spark-examples*.jar
// :: INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.0.81:
// :: INFO yarn.Client: Requesting a new application from cluster with NodeManagers
// :: INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container)
// :: INFO yarn.Client: Will allocate AM container, with MB memory including MB overhead
// :: INFO yarn.Client: Setting up container launch context for our AM
// :: INFO yarn.Client: Setting up the launch environment for our AM container
// :: INFO yarn.Client: Preparing resources for our AM container
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO yarn.Client: Uploading resource file:/root/spark-1.5.-bin-hadoop2./lib/spark-assembly-1.5.-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-assembly-1.5.1-hadoop2.6.0.jar
// :: INFO yarn.Client: Uploading resource file:/root/spark-1.5.-bin-hadoop2./lib/spark-examples-1.5.-hadoop2.6.0.jar -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/spark-examples-1.5.1-hadoop2.6.0.jar
// :: INFO yarn.Client: Uploading resource file:/tmp/spark-72a1a44a-029c--acd1-6fbd44f5709a/__spark_conf__2902795872463320162.zip -> hdfs://node1:8020/user/root/.sparkStaging/application_1446368481906_0008/__spark_conf__2902795872463320162.zip
// :: INFO spark.SecurityManager: Changing view acls to: root
// :: INFO spark.SecurityManager: Changing modify acls to: root
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
// :: INFO yarn.Client: Submitting application to ResourceManager
// :: INFO impl.YarnClientImpl: Submitted application application_1446368481906_0008
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -
queue: root.thequeue
start time:
final status: UNDEFINED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/
user: root
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: ACCEPTED)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.0.83
ApplicationMaster RPC port:
queue: root.thequeue
start time:
final status: UNDEFINED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/
user: root
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: RUNNING)
// :: INFO yarn.Client: Application report for application_1446368481906_0008 (state: FINISHED)
// :: INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.0.83
ApplicationMaster RPC port:
queue: root.thequeue
start time:
final status: SUCCEEDED
tracking URL: http://node1:8088/proxy/application_1446368481906_0008/A
user: root
// :: INFO util.ShutdownHookManager: Shutdown hook called
// :: INFO util.ShutdownHookManager: Deleting directory /tmp/spark-72a1a44a-029c--acd1-6fbd44f5709a
[root@node1 spark-1.5.-bin-hadoop2.]#

3.使用spark-sql

将/etc/hive/conf/hive-site.xml拷贝到/root/spark-1.5.1-bin-hadoop2.6/conf下,启动spark-sql即可

[root@node1 spark-1.5.-bin-hadoop2.]# cp /etc/hive/conf/hive-site.xml conf/
[root@node1 spark-1.5.-bin-hadoop2.]# ./bin/spark-sql

Spark-1.5.1 on CDH-5.4.7的更多相关文章

  1. spark on yarn 资源调度(cdh为例)

    一.CPU配置: ApplicationMaster 虚拟 CPU内核 yarn.app.mapreduce.am.resource.cpu-vcores ApplicationMaster占用的cp ...

  2. 【Spark】必须要用CDH版本的Spark?那你是不是需要重新编译?

    目录 为什么要重新编译? 步骤 一.下载Spark的源码 二.准备linux环境,安装必须软件 三.解压spark源码,修改配置,准备编译 四.开始编译 为什么要重新编译? 由于我们所有的环境统一使用 ...

  3. 1、Spark 2.1 源码编译支持CDH

    目前CDH支持的spark版本都是1.x, 如果想要使用spark 2x的版本, 只能编译spark源码生成支持CDH的版本. 一.准备工作 找一台Linux主机, 由于spark源码编译会下载很多的 ...

  4. Why Apache Spark is a Crossover Hit for Data Scientists [FWD]

    Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...

  5. 转:Sharethrough使用Spark Streaming优化实时竞价

    文章来自于:http://www.infoq.com/cn/news/2014/04/spark-streaming-bidding 来自于Sharethrough的数据基础设施工程师Russell ...

  6. CDH集群安装&测试总结

    0.绪论 之前完全没有接触过大数据相关的东西,都是书上啊,媒体上各种吹嘘啊,我对大数据,集群啊,分布式计算等等概念真是高山仰止,充满了仰望之情,觉得这些东西是这样的: 当我搭建的过程中,发现这些东西是 ...

  7. hive on spark

    hive on spark 的配置及设置CDH都已配置好,直接使用就行,但是我在用的时候报错,如下: 具体操作如下时报的错:      在hive 里执行以下命令:     set hive.exec ...

  8. hive使用spark引擎的几种情况

    使用spark引擎查询hive有以下几种方式:1>使用spark-sql(spark sql cli)2>使用spark-thrift提交查询sql3>使用hive on spark ...

  9. CDH集群spark-shell执行过程分析

    目的 刚入门spark,安装的是CDH的版本,版本号spark-core_2.11-2.4.0-cdh6.2.1,部署了cdh客户端(非集群节点),本文主要以spark-shell为例子,对在cdh客 ...

  10. 部署开启了Kerberos身份验证的大数据平台集群外客户端

    转载请注明出处 :http://www.cnblogs.com/xiaodf/ 本文档主要用于说明,如何在集群外节点上,部署大数据平台的客户端,此大数据平台已经开启了Kerberos身份验证.通过客户 ...

随机推荐

  1. moq 的常用使用方法

    测试方法                             Console.WriteLine(mock.Object.GetCountThing()); 匹配参数   mock.Setup(x ...

  2. Eclipse 快捷键 (应用中自己总结)

    调试快捷键: 1: resume(F8) 调试中用来直接跳到下一个断点 2:  用来结束JVM 3:step into (F5)跳入函数 4: step over (F6)单步执行 5:step re ...

  3. mac下安装 xampp 无法启动apache (转,留用)

    1.查看端口是否被占用 sudo lsof -i -n   2.用终端运行xampp,查看具体的错误 sudo su /Applications/XAMPP/xamppfiles/xampp star ...

  4. 详解 $_SERVER 函数中QUERY_STRING和REQUEST_URI区别(转)

    对于php$_SERVER这个全局变量 ,里面有很多的参数,慢慢的熟悉 1,http://localhost/aaa/ (打开aaa中的index.php)结果:$_SERVER['QUERY_STR ...

  5. codeigniter nginx配置

    转载:http://www.nginx.cn/1134.html server{ listen 80; server_name www.ci.oa.com; access_log /usr/local ...

  6. [地图SkyLine二次开发]关于IE内存限制问题(1G)

    相信很多人也遇到过同样的问题,地图加载中,IE占用的内存一直增加,到了1G多一些的时候,IE就崩溃了. 在网上查阅了一番,有很多结果,下面归纳一下: a).64bit的IE最多可达到4G的内存,但Sk ...

  7. js获取当前日期

    var myDate = new Date();myDate.getYear();        //获取当前年份(2位)myDate.getFullYear();    //获取完整的年份(4位,1 ...

  8. 老王讲自制RPC框架.(三.ZOOKEEPER)

    (#)定义Zookeeper 分布式服务框架是 Apache Hadoop 的一个子项目,它主要是用来解决分布式应用中经常遇到的一些数据管理问题,如:统一命名服务.状态同步服务.集群管理.分布式应用配 ...

  9. Redis教程(三) list类型

     一.概述: redis的list类型其实就是一个每个子元素都是string类型的双向链表.所以[lr]push和[lr]pop命令的算法时间复杂度都是O(1) 另外list会记录链表的长度.所以ll ...

  10. instanceof, isinstance,isAssignableFrom的区别

    instanceof运算符 只被用于对象引用变量,检查左边的被测试对象 是不是 右边类或接口的 实例化.如果被测对象是null值,则测试结果总是false. 形象地:自身实例或子类实例 instanc ...