Zeppelin 0.6.2使用Spark的yarn-client模式
Zeppelin版本0.6.2
1. Export SPARK_HOME
In conf/zeppelin-env.sh, export SPARK_HOME environment variable with your Spark installation path.
You can optionally export HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS
export SPARK_HOME=/usr/crh/4.9.2.5-/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/opt/jdk1..0_79
这儿虽然添加了SPARK_HOME但是后面使用的时候还是找不到包。
2. Set master in Interpreter menu
After start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
spark解释器设置为yarn-client模式
FAQ
1.
ERROR [2016-07-26 16:46:15,999] ({pool-2-thread-2} Job.java[run]:189) - Job failed
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:70)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:765)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Solution
把SPARK_HOME/lib目录下的所有jar包都拷到zeppelin的lib下。
2.
%spark.sql
show tables
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1481857320971_0028":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3018)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2988)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1057)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1053)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1053)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1046)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:598)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:281)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:634)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:123)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Solution
hadoop fs -chown root:hdfs /user/root
3.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6a79f5df
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@59b2aabc
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@129d0b9b
org.apache.spark.sql.AnalysisException: Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;
at org.apache.spark.sql.catalyst.analysis.Catalog$class.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$.apply(TreeNode.scala:)
val dataset = spark.sql("select knife_dish_power,penetration,knife_dish_torque,total_propulsion,knife_dish_speed_readings,propulsion_speed1 from `tbm.tbm_test` where knife_dish_power!=0 and penetration!=0")
如上sql中给表名和库名添加``。
然后又报如下错:
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4dd69db0
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4072dd9
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@238ac654
java.lang.RuntimeException: Table Not Found: tbm.tbm_test
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)
原因:我用的是org.apache.spark.sql.SQLContext对象spark查询hive中的数据,查询hive的数据需要org.apache.spark.sql.hive.HiveContext对象sqlContext或sqlc。
实例:


顺便记录一下spark-shell使用HiveContext:
集群环境是HDP2.3.4.0
spark版本是1.5.2
spark-shell
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> hiveContext.sql("show tables").collect().foreach(println)
[gps_p1,false]
scala> hiveContext.sql("select * from g").collect().foreach(println)
[1,li]
[1,li]
[1,li]
[1,li]
[1,li]
4.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4d66e4f8
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
$iwC$$iwC$$iwC.<init>(<console>:65)
$iwC$$iwC.<init>(<console>:67)
$iwC.<init>(<console>:69)
<init>(<console>:71)
.<init>(<console>:75)
.<clinit>(<console>)
.<init>(<console>:7)
.<clinit>(<console>)
$print(<console>)
Solution:
val conf = new SparkConf().setAppName("test").set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(conf)
val spark = new SQLContext(sc)
在上面添加set("spark.driver.allowMultipleContexts", "true")。
Zeppelin 0.6.2使用Spark的yarn-client模式的更多相关文章
- Spark on YARN运行模式(图文详解)
不多说,直接上干货! 请移步 Spark on YARN简介与运行wordcount(master.slave1和slave2)(博主推荐) Spark on YARN模式的安装(spark-1.6. ...
- Spark之Yarn提交模式
一.Client模式 提交命令: ./spark-submit --master yarn --class org.apache.examples.SparkPi ../lib/spark-examp ...
- 大话Spark(2)-Spark on Yarn运行模式
Spark On Yarn 有两种运行模式: Yarn - Cluster Yarn - Client 他们的主要区别是: Cluster: Spark的Driver在App Master主进程内运行 ...
- 解决Spark On Yarn yarn-cluster模式下的No Suitable Driver问题
Spark版本:2.2.0_2.11 我们在项目中通过Spark SQL JDBC连接MySQL,在启动Driver/Executor执行的时候都碰到了这个问题.网上解决方案我们全部都试过了,奉上我们 ...
- yarn cluster和yarn client模式区别——yarn-cluster适用于生产环境,结果存HDFS;而yarn-client适用于交互和调试,也就是希望快速地看到application的输出
Yarn-cluster VS Yarn-client 从广义上讲,yarn-cluster适用于生产环境:而yarn-client适用于交互和调试,也就是希望快速地看到application的输出. ...
- Spark(四十九):Spark On YARN启动流程源码分析(一)
引导: 该篇章主要讲解执行spark-submit.sh提交到将任务提交给Yarn阶段代码分析. spark-submit的入口函数 一般提交一个spark作业的方式采用spark-submit来提交 ...
- Spark On YARN启动流程源码分析(一)
本文主要参考: a. https://www.cnblogs.com/yy3b2007com/p/10934090.html 0. 说明 a. 关于spark源码会不定期的更新与补充 b. 对于spa ...
- Spark 1.0.0 横空出世 Spark on Yarn 部署(Hadoop 2.4)
就在昨天,北京时间5月30日20点多.Spark 1.0.0最终公布了:Spark 1.0.0 released 依据官网描写叙述,Spark 1.0.0支持SQL编写:Spark SQL Progr ...
- Spark on YARN模式的安装(spark-1.6.1-bin-hadoop2.6.tgz + hadoop-2.6.0.tar.gz)(master、slave1和slave2)(博主推荐)
说白了 Spark on YARN模式的安装,它是非常的简单,只需要下载编译好Spark安装包,在一台带有Hadoop YARN客户端的的机器上运行即可. Spark on YARN简介与运行wor ...
随机推荐
- 在Visual Studio 2017中找不到.NET Framework 4.6.2
原文 https://blogs.msdn.microsoft.com/benjaminperkins/2017/03/23/net-framwork-4-6-2-not-in-visual-stud ...
- Delphi 屏幕抓图技术的实现
摘 要:本文以Delphi7.0作为开发平台,给出了网络监控软件中的两种屏幕抓图技术的设计方法和步骤.介绍了教师在计算机机房内教学时,如何监控学生计算机显示器上的画面,以保证教学的质量和效果. 引言 ...
- 编译 Qt 5.6(使QtWebEngine支持XP)
说明 qt 5.6的编译进行了数十遍,才得出本文的可行方案,之所以花了这么多的时间,主要是qt引入了QtWebEngine模块后,导致编译难度直线上升,而且又有一些中国特色的问题(如360安全卫士)导 ...
- UILabel范例实现代码如下
#import "TWO_ViewController.h" #define SCREEN_Width [[UIScreen mainScreen] bounds].siz ...
- 基于 HTML5 Canvas 的元素周期表展示
前言 之前在网上看到别人写的有关元素周期表的文章,深深的勾起了一波回忆,记忆里初中时期背的“氢氦锂铍硼,碳氮氧氟氖,钠镁铝硅磷,硫氯氩钾钙”.“养(氧)龟(硅)铝铁盖(钙),哪(钠)家(钾)没(镁)青 ...
- Hyperledger Fabric1.4环境搭建过程
简单记录一下fabric版本1.4的环境搭建,运行环境为Ubuntu18.04,其中一些内容是根据官方文档整理的,如有错误欢迎批评指正. 本文只介绍最简单的环境搭建方法,具体的环境搭建解析在这里深入解 ...
- MCtalk对话抱抱星英语:从Diss在线英语教学乱象到回归教育本原
在百度搜索输入“在线英语”四字,清一色的线上英语培训品牌琳琅满目,“免费英语学习”.“外教口语一对一培训”.“四级听力”.“专属外教”,竞争之激烈可见一斑,创业公司绞尽脑汁挖掘细分市场,试图在一片红海 ...
- 如何为linux服务器配置DNS解析?
本文建立在已经搭建好DNS服务器时,为linux机器配置DNS服务器的三种方式. IP地址是网络上标识站点的数字地址,为了方便记忆,采用域名来代替IP地址标识站点地址.DNS(域名解析)就是域名到IP ...
- 系统学习 Java IO ---- 目录,概览
Java IO 类的系统教程,原创.主要参考自英文教程 Java IO Tutorial 和 Java Doc. http://tutorials.jenkov.com/java-io/index.h ...
- 以实现MongoDB副本集状态的监控为例,看Telegraf系统中Exec输入插件如何编写部署
既有的Telegraf 关于MongoDB的输入插件很难实现对副本集节点状态的监控,副本集节点状态有 PRIMARY.SECONDARY.RECOVERYING.ARBITER 等.现在我们尝试通过 ...