Zeppelin使用Spark的yarn-client模式
Zeppelin版本0.6.2
1. Export SPARK_HOME
In conf/zeppelin-env.sh, export SPARK_HOME environment variable with your Spark installation path.
You can optionally export HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS
export SPARK_HOME=/usr/crh/4.9.2.5-/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/opt/jdk1..0_79
这儿虽然添加了SPARK_HOME但是后面使用的时候还是找不到包。
2. Set master in Interpreter menu
After start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
spark解释器设置为yarn-client模式
FAQ
1.
ERROR [2016-07-26 16:46:15,999] ({pool-2-thread-2} Job.java[run]:189) - Job failed
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:70)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:765)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Solution
把SPARK_HOME/lib目录下的所有jar包都拷到zeppelin的lib下。
2.
%spark.sql
show tables
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1481857320971_0028":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3018)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2988)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1057)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1053)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1053)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1046)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:598)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:281)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:634)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:123)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Solution
hadoop fs -chown root:hdfs /user/root
3.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6a79f5df
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@59b2aabc
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@129d0b9b
org.apache.spark.sql.AnalysisException: Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;
at org.apache.spark.sql.catalyst.analysis.Catalog$class.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$.apply(TreeNode.scala:)
val dataset = spark.sql("select knife_dish_power,penetration,knife_dish_torque,total_propulsion,knife_dish_speed_readings,propulsion_speed1 from `tbm.tbm_test` where knife_dish_power!=0 and penetration!=0")
如上sql中给表名和库名添加``。
然后又报如下错:
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4dd69db0
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4072dd9
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@238ac654
java.lang.RuntimeException: Table Not Found: tbm.tbm_test
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)
原因:我用的是org.apache.spark.sql.SQLContext对象spark查询hive中的数据,查询hive的数据需要org.apache.spark.sql.hive.HiveContext对象sqlContext或sqlc。
实例:


顺便记录一下spark-shell使用HiveContext:
集群环境是HDP2.3.4.0
spark版本是1.5.2
spark-shell
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> hiveContext.sql("show tables").collect().foreach(println)
[gps_p1,false]
scala> hiveContext.sql("select * from g").collect().foreach(println)
[1,li]
[1,li]
[1,li]
[1,li]
[1,li]
4.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4d66e4f8
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
$iwC$$iwC$$iwC.<init>(<console>:65)
$iwC$$iwC.<init>(<console>:67)
$iwC.<init>(<console>:69)
<init>(<console>:71)
.<init>(<console>:75)
.<clinit>(<console>)
.<init>(<console>:7)
.<clinit>(<console>)
$print(<console>)
Solution:
val conf = new SparkConf().setAppName("test").set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(conf)
val spark = new SQLContext(sc)
在上面添加set("spark.driver.allowMultipleContexts", "true")。
Zeppelin使用Spark的yarn-client模式的更多相关文章
- Spark之Yarn提交模式
一.Client模式 提交命令: ./spark-submit --master yarn --class org.apache.examples.SparkPi ../lib/spark-examp ...
- 大话Spark(2)-Spark on Yarn运行模式
Spark On Yarn 有两种运行模式: Yarn - Cluster Yarn - Client 他们的主要区别是: Cluster: Spark的Driver在App Master主进程内运行 ...
- Spark on YARN运行模式(图文详解)
不多说,直接上干货! 请移步 Spark on YARN简介与运行wordcount(master.slave1和slave2)(博主推荐) Spark on YARN模式的安装(spark-1.6. ...
- yarn cluster和yarn client模式区别——yarn-cluster适用于生产环境,结果存HDFS;而yarn-client适用于交互和调试,也就是希望快速地看到application的输出
Yarn-cluster VS Yarn-client 从广义上讲,yarn-cluster适用于生产环境:而yarn-client适用于交互和调试,也就是希望快速地看到application的输出. ...
- 解决Spark On Yarn yarn-cluster模式下的No Suitable Driver问题
Spark版本:2.2.0_2.11 我们在项目中通过Spark SQL JDBC连接MySQL,在启动Driver/Executor执行的时候都碰到了这个问题.网上解决方案我们全部都试过了,奉上我们 ...
- Spark(四十九):Spark On YARN启动流程源码分析(一)
引导: 该篇章主要讲解执行spark-submit.sh提交到将任务提交给Yarn阶段代码分析. spark-submit的入口函数 一般提交一个spark作业的方式采用spark-submit来提交 ...
- Spark On YARN启动流程源码分析(一)
本文主要参考: a. https://www.cnblogs.com/yy3b2007com/p/10934090.html 0. 说明 a. 关于spark源码会不定期的更新与补充 b. 对于spa ...
- spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)
不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...
- 理解Spark运行模式(一)(Yarn Client)
Spark运行模式有Local,STANDALONE,YARN,MESOS,KUBERNETES这5种,其中最为常见的是YARN运行模式,它又可分为Client模式和Cluster模式.这里以Spar ...
随机推荐
- 解决android studiogradle 错误 找不到程序包 符号
问题:android studio gradle 错误 找不到程序包 符号解决:原因是library工程的build.gradle含有release,只要把其注释即可 buildTypes { rel ...
- data格式加载图片
html img 使用data格式加载图片 背景 这久闲来无事给一位客户测试一款体检软件,是winform结构的,其中有一个功能是需要把生成的体检报告导出为html格式.测试导出后直接在谷歌浏览器 ...
- ShardedJedis实现学习
ShardedJedis实现学习-我们到底能走多远系列(33) 我们到底能走多远系列(31) 扯淡: 工作是容易的赚钱是困难的 恋爱是容易的成家是困难的 相爱是容易的相处是困难的 决定是容易的可是等待 ...
- iOS多线程的初步研究
iOS多线程的初步研究(四) 理解run loop后,才能彻底理解NSTimer的实现原理,也就是说NSTimer实际上依赖run loop实现的. 先看看NSTimer的两个常用方法: + (NST ...
- dotfiles for linux/unix users automatically! (python Vim IDE)
Here is a brief introduction and package of dotfiles for linux/unix user. I think there are enough i ...
- 菜单工具栏wxPython菜单与工具栏基础示例
这两天一直在学习菜单工具栏之类的问题,上午正好有机会和大家讨论一下. 1.基本的api介绍 Package wx :: Class Menu Type Menu Method Summary Menu ...
- PHP curl之爬虫初步
php的curl可以实现模拟http的各种请求,这也是php做网络爬虫的基础,也多用于接口api的调用. 这个时候有人就要发问了:为什么你特么不用file_get_contents? curl的性能比 ...
- OC之类与对象
1.面向过程与面向对象. 1). 完成需求1 将大象放进冰箱. a. 把冰箱门打开. b. 把大象放进去. c. 把冰箱门关上. 这是面向过程的思路. 找1个冰箱,要求这个冰箱可以自己开门,自己把大象 ...
- AngularJS 实现页面滚动到底自动加载数据的功能
要实现这个功能,可以通过https://github.com/sroze/ngInfiniteScroll这个第三方控件来实现.步骤如下: 1. 下载ng-infinite-scroll.js程序ht ...
- Java Web学习笔记--JSP for循环
JSP for循环 <%@ page language="java" contentType="text/html; charset=UTF-8" %&g ...