Zeppelin使用Spark的yarn-client模式
Zeppelin版本0.6.2
1. Export SPARK_HOME
In conf/zeppelin-env.sh, export SPARK_HOME
environment variable with your Spark installation path.
You can optionally export HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS
export SPARK_HOME=/usr/crh/4.9.2.5-/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/opt/jdk1..0_79
这儿虽然添加了SPARK_HOME但是后面使用的时候还是找不到包。
2. Set master in Interpreter menu
After start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
spark解释器设置为yarn-client模式
FAQ
1.
ERROR [2016-07-26 16:46:15,999] ({pool-2-thread-2} Job.java[run]:189) - Job failed
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:70)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:765)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Solution
把SPARK_HOME/lib目录下的所有jar包都拷到zeppelin的lib下。
2.
%spark.sql
show tables
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1481857320971_0028":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3018)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2988)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1057)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1053)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1053)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1046)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:598)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:281)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:634)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:123)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Solution
hadoop fs -chown root:hdfs /user/root
3.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6a79f5df
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@59b2aabc
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@129d0b9b
org.apache.spark.sql.AnalysisException: Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;
at org.apache.spark.sql.catalyst.analysis.Catalog$class.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$.apply(TreeNode.scala:)
val dataset = spark.sql("select knife_dish_power,penetration,knife_dish_torque,total_propulsion,knife_dish_speed_readings,propulsion_speed1 from `tbm.tbm_test` where knife_dish_power!=0 and penetration!=0")
如上sql中给表名和库名添加``。
然后又报如下错:
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4dd69db0
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4072dd9
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@238ac654
java.lang.RuntimeException: Table Not Found: tbm.tbm_test
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)
原因:我用的是org.apache.spark.sql.SQLContext对象spark查询hive中的数据,查询hive的数据需要org.apache.spark.sql.hive.HiveContext对象sqlContext或sqlc。
实例:
顺便记录一下spark-shell使用HiveContext:
集群环境是HDP2.3.4.0
spark版本是1.5.2
spark-shell
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> hiveContext.sql("show tables").collect().foreach(println)
[gps_p1,false]
scala> hiveContext.sql("select * from g").collect().foreach(println)
[1,li]
[1,li]
[1,li]
[1,li]
[1,li]
4.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4d66e4f8
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
$iwC$$iwC$$iwC.<init>(<console>:65)
$iwC$$iwC.<init>(<console>:67)
$iwC.<init>(<console>:69)
<init>(<console>:71)
.<init>(<console>:75)
.<clinit>(<console>)
.<init>(<console>:7)
.<clinit>(<console>)
$print(<console>)
Solution:
val conf = new SparkConf().setAppName("test").set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(conf)
val spark = new SQLContext(sc)
在上面添加set("spark.driver.allowMultipleContexts", "true")。
Zeppelin使用Spark的yarn-client模式的更多相关文章
- Spark之Yarn提交模式
一.Client模式 提交命令: ./spark-submit --master yarn --class org.apache.examples.SparkPi ../lib/spark-examp ...
- 大话Spark(2)-Spark on Yarn运行模式
Spark On Yarn 有两种运行模式: Yarn - Cluster Yarn - Client 他们的主要区别是: Cluster: Spark的Driver在App Master主进程内运行 ...
- Spark on YARN运行模式(图文详解)
不多说,直接上干货! 请移步 Spark on YARN简介与运行wordcount(master.slave1和slave2)(博主推荐) Spark on YARN模式的安装(spark-1.6. ...
- yarn cluster和yarn client模式区别——yarn-cluster适用于生产环境,结果存HDFS;而yarn-client适用于交互和调试,也就是希望快速地看到application的输出
Yarn-cluster VS Yarn-client 从广义上讲,yarn-cluster适用于生产环境:而yarn-client适用于交互和调试,也就是希望快速地看到application的输出. ...
- 解决Spark On Yarn yarn-cluster模式下的No Suitable Driver问题
Spark版本:2.2.0_2.11 我们在项目中通过Spark SQL JDBC连接MySQL,在启动Driver/Executor执行的时候都碰到了这个问题.网上解决方案我们全部都试过了,奉上我们 ...
- Spark(四十九):Spark On YARN启动流程源码分析(一)
引导: 该篇章主要讲解执行spark-submit.sh提交到将任务提交给Yarn阶段代码分析. spark-submit的入口函数 一般提交一个spark作业的方式采用spark-submit来提交 ...
- Spark On YARN启动流程源码分析(一)
本文主要参考: a. https://www.cnblogs.com/yy3b2007com/p/10934090.html 0. 说明 a. 关于spark源码会不定期的更新与补充 b. 对于spa ...
- spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)
不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...
- 理解Spark运行模式(一)(Yarn Client)
Spark运行模式有Local,STANDALONE,YARN,MESOS,KUBERNETES这5种,其中最为常见的是YARN运行模式,它又可分为Client模式和Cluster模式.这里以Spar ...
随机推荐
- 使用protobuf编写配置文件以及读写
.proto文件示例 message Configure { required ; required uint32 port = ; } 写配置文件 Configure config; config. ...
- .NET自带IOC
.NET自带IOC 本文主要把MEF作为一种IOC容器进行讲解,.net中可用的IOC容器非常多,如 CastleWindsor,Unity,Autofac,ObjectBuilder,Structu ...
- Microsoft 电信项目组 Net代码生成器1.1
Microsoft 电信项目组 Net代码生成器1.1 微软电信项目组代码生成器 为什么要使用这个代码生成器: 1.这个代码生成器采用 微软 企业库5.0 版本,目前CAPO所在的项目组都在使用这个 ...
- 分享自制的C#和VB Code互转工具
分享自制的C#和VB Code互转工具 作为.NET程序员,往往习惯使用一种语言(据我观察,2006年后的程序员习惯用C#,之前的喜欢VB).而对于另一种语言虽然能读懂但是写起来总是比较费事.尤其面对 ...
- spring不依赖注入得到实体bean
如题,我们一般用spring的ioc,通过配置注入接口得到这个实现类,现在通过研究公司平台框架发现还有一种方法得到spring文件配置的bean方法,举个例子(注:这个ApplicationConte ...
- android:minSdkVersion 之我见
在 新建一个 android project 时,要求输入 minSdkVersion 这一项,一般我们是指定和我们使用的 SDK 版本相一致的 API Level. 然后,在androidManif ...
- 玩转docker镜像和镜像构建
摘要 本文从个人的角度,讲述对于docker镜像和镜像构建的一些实践经验.主要内容包括利用docker hub进行在线编译,下载镜像,dind的实践,对于镜像的一些思考等.本文是对当时微信分享内容的一 ...
- 一口一口吃掉Hexo(二)
如果你想得到更好的阅读效果,请访问我的个人网站 ,版权所有,未经许可不得转载! 本次系列教程的第二篇文章我会介绍如何在本地安装Hexo,请注意我使用的Windows系统,如果你是Mac或者Ubuntu ...
- 【Zookeeper】源码分析之持久化--FileSnap
一.前言 前篇博文已经分析了FileTxnLog的源码,现在接着分析持久化中的FileSnap,其主要提供了快照相应的接口. 二.SnapShot源码分析 SnapShot是FileTxnLog的父类 ...
- 用浅/深拷贝、和HTML5方法解决js对象的引用的问题
先来看一个例子 例一: var a=[1,2,3]; var b=a; b.push(4); alert(b);//1,2,3,4 alert(a);//1,2,3,4 var a=[1,2,3]; ...