Zeppelin版本0.6.2

1. Export SPARK_HOME

In conf/zeppelin-env.sh, export SPARK_HOME environment variable with your Spark installation path.

You can optionally export HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS

export SPARK_HOME=/usr/crh/4.9.2.5-/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/opt/jdk1..0_79

这儿虽然添加了SPARK_HOME但是后面使用的时候还是找不到包。

2. Set master in Interpreter menu

After start Zeppelin, go to Interpreter menu and edit master property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.

spark解释器设置为yarn-client模式

FAQ

1.

ERROR [2016-07-26 16:46:15,999] ({pool-2-thread-2} Job.java[run]:189) - Job failed
java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop.<init>(SparkILoop.scala:70)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:765)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Solution

把SPARK_HOME/lib目录下的所有jar包都拷到zeppelin的lib下。

2.

%spark.sql
show tables

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1481857320971_0028":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3018)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2988)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1057)
at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1053)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1053)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1046)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:598)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:281)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:634)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:123)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Solution

hadoop fs -chown root:hdfs /user/root

3.

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@6a79f5df
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@59b2aabc
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@129d0b9b
org.apache.spark.sql.AnalysisException: Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;
at org.apache.spark.sql.catalyst.analysis.Catalog$class.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.checkTableIdentifier(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$.applyOrElse(Analyzer.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$.apply(LogicalPlan.scala:)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$.apply(TreeNode.scala:)
val dataset = spark.sql("select knife_dish_power,penetration,knife_dish_torque,total_propulsion,knife_dish_speed_readings,propulsion_speed1 from `tbm.tbm_test` where knife_dish_power!=0 and penetration!=0")

如上sql中给表名和库名添加``。

然后又报如下错:

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4dd69db0
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4072dd9
spark: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@238ac654
java.lang.RuntimeException: Table Not Found: tbm.tbm_test
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:139)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:257)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$7.applyOrElse(Analyzer.scala:268)

原因:我用的是org.apache.spark.sql.SQLContext对象spark查询hive中的数据,查询hive的数据需要org.apache.spark.sql.hive.HiveContext对象sqlContext或sqlc。

实例:

顺便记录一下spark-shell使用HiveContext:

集群环境是HDP2.3.4.0

spark版本是1.5.2

spark-shell
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> hiveContext.sql("show tables").collect().foreach(println)
[gps_p1,false]
scala> hiveContext.sql("select * from g").collect().foreach(println)
[1,li]
[1,li]
[1,li]
[1,li]
[1,li]

4.

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.regression.LinearRegression
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@4d66e4f8
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
$iwC$$iwC$$iwC.<init>(<console>:65)
$iwC$$iwC.<init>(<console>:67)
$iwC.<init>(<console>:69)
<init>(<console>:71)
.<init>(<console>:75)
.<clinit>(<console>)
.<init>(<console>:7)
.<clinit>(<console>)
$print(<console>)

Solution:

val conf = new SparkConf().setAppName("test").set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(conf)
val spark = new SQLContext(sc)

在上面添加set("spark.driver.allowMultipleContexts", "true")。

Zeppelin使用Spark的yarn-client模式的更多相关文章

  1. Spark之Yarn提交模式

    一.Client模式 提交命令: ./spark-submit --master yarn --class org.apache.examples.SparkPi ../lib/spark-examp ...

  2. 大话Spark(2)-Spark on Yarn运行模式

    Spark On Yarn 有两种运行模式: Yarn - Cluster Yarn - Client 他们的主要区别是: Cluster: Spark的Driver在App Master主进程内运行 ...

  3. Spark on YARN运行模式(图文详解)

    不多说,直接上干货! 请移步 Spark on YARN简介与运行wordcount(master.slave1和slave2)(博主推荐) Spark on YARN模式的安装(spark-1.6. ...

  4. yarn cluster和yarn client模式区别——yarn-cluster适用于生产环境,结果存HDFS;而yarn-client适用于交互和调试,也就是希望快速地看到application的输出

    Yarn-cluster VS Yarn-client 从广义上讲,yarn-cluster适用于生产环境:而yarn-client适用于交互和调试,也就是希望快速地看到application的输出. ...

  5. 解决Spark On Yarn yarn-cluster模式下的No Suitable Driver问题

    Spark版本:2.2.0_2.11 我们在项目中通过Spark SQL JDBC连接MySQL,在启动Driver/Executor执行的时候都碰到了这个问题.网上解决方案我们全部都试过了,奉上我们 ...

  6. Spark(四十九):Spark On YARN启动流程源码分析(一)

    引导: 该篇章主要讲解执行spark-submit.sh提交到将任务提交给Yarn阶段代码分析. spark-submit的入口函数 一般提交一个spark作业的方式采用spark-submit来提交 ...

  7. Spark On YARN启动流程源码分析(一)

    本文主要参考: a. https://www.cnblogs.com/yy3b2007com/p/10934090.html 0. 说明 a. 关于spark源码会不定期的更新与补充 b. 对于spa ...

  8. spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)

    不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...

  9. 理解Spark运行模式(一)(Yarn Client)

    Spark运行模式有Local,STANDALONE,YARN,MESOS,KUBERNETES这5种,其中最为常见的是YARN运行模式,它又可分为Client模式和Cluster模式.这里以Spar ...

随机推荐

  1. ios学习之路四(新建Sprite Kit 项目的时候出现apple LLVM 5.0 error)

    在新建sprite kit 项目的时候出现"apple LLVM 5.0 error" 解决方法 在网上搜索,stackoverflow 上是这么说的点击打开链接.按照他的我也没解 ...

  2. asp.net mvc异步查询

    对于asp.net mvc异步查询 如何做MVC异步查询,做列表页面. 查询是项目中必不可少的工作,而且不同的项目不同的团队,都有自己的简单方法.Asp.net mvc 有自己独特的优势,下面是结合m ...

  3. java用正则表达式获取domain

    在工作中经常用到获取url的来源和域名的黑白名单功能.前段时间写了一个获取url中域名的方法.但是在测试过程中发现有些小问题. /** * 根据URL获取domain * @param url * @ ...

  4. 更有效率的使用Visual Studio2

    PS色调均化滤镜的快捷实现(C#源代码). photoshop色调均化功能通常是在进行修片处理前期比较常用的功能之一,其对扩展图像的对比度,增强视觉效果有一定的作用.在很多课本或者文章中,也称这种处理 ...

  5. 12、Python-网络编程

    1.套接字1.1 socket模块套接字是网络编程中的一个基本组件,一般包括服务器端套接字和客户端套接字. 创建服务器端过程如下: import socket s = socket.socket() ...

  6. 去除scons构建动态库的前缀lib

    如何使用scons构建工程,请参考快速构建C++项目工具Scons,结合Editplus搭建开发环境. 编译SharedLibrary项目的时候,生产的so文件时自动加上lib, 例如: env = ...

  7. asp.net MVC 路由机制

    1:ASP.NET的路由机制主要有两种用途: -->1:匹配请求的Url,将这些请求映射到控制器 -->2:选择一个匹配的路由,构造出一个Url 2:ASP.NET路由机制与URL重写的区 ...

  8. SVN插件

    using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.D ...

  9. [置顶] 使用Android OpenGL ES 2.0绘图之六:响应触摸事件

    传送门 ☞ 系统架构设计 ☞ 转载请注明 ☞ http://blog.csdn.net/leverage_1229 传送门 ☞ GoF23种设计模式 ☞ 转载请注明 ☞ http://blog.csd ...

  10. 调试设置移动端Web开发环境搭建实践

    新手发帖,很多方面都是刚入门,有错误的地方请大家见谅,欢迎批评指正 本文重要总结一下挪动端进行前端开发时需要用到的一些工具,以及他们之间互相的组合,同时也包含本人应用的组合. 1. Chrome To ...