spark on yarn 提交任务出错
Application ID is application_1481285758114_422243, trackingURL: http://***:4040
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://mycluster-tj/user/engine_arch/data/mllib/sample_libsvm_data.txt
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1994)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at org.apache.spark.mllib.util.MLUtils$.loadLibSVMFile(MLUtils.scala:105)
at org.apache.spark.mllib.util.MLUtils$.loadLibSVMFile(MLUtils.scala:134)
at org.apache.spark.ml.source.libsvm.LibSVMRelation.buildScan(LibSVMRelation.scala:49)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:135)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.rdd$lzycompute(DataFrame.scala:1638)
at org.apache.spark.sql.DataFrame.rdd(DataFrame.scala:1635)
at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1411)
at org.apache.spark.ml.feature.StandardScaler.fit(StandardScaler.scala:90)
at com.xiaoju.arch.engine.spark.SparkDD.buildStandardScaler(SparkDD.java:149)
at com.xiaoju.arch.engine.spark.StandardScalerDemo.main(StandardScalerDemo.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
spark中读取文件,在yarn模式下,必须将文件存入hdfs中,不然就会报错。单机模式下无此问题。
spark on yarn 提交任务出错的更多相关文章
- 【原创】大叔经验分享(19)spark on yarn提交任务之后执行进度总是10%
spark 2.1.1 系统中希望监控spark on yarn任务的执行进度,但是监控过程发现提交任务之后执行进度总是10%,直到执行成功或者失败,进度会突然变为100%,很神奇, 下面看spark ...
- spark利用yarn提交任务报:YARN application has exited unexpectedly with state UNDEFINED
spark用yarn提交任务会报ERROR cluster.YarnClientSchedulerBackend: YARN application has exited unexpectedly w ...
- 【原创】大叔经验分享(14)spark on yarn提交任务到集群后spark-submit进程一直等待
spark on yarn通过--deploy-mode cluster提交任务之后,应用已经在yarn上执行了,但是spark-submit提交进程还在,直到应用执行结束,提交进程才会退出,有时这会 ...
- Spark通过YARN提交任务不成功(包含YARN cluster和YARN client)
无论用YARN cluster和YARN client来跑,均会出现如下问题. [spark@master spark-1.6.1-bin-hadoop2.6]$ jps 2049 NameNode ...
- spark on yarn提交任务时报ClosedChannelException解决方案
spark2.1出来了,想玩玩就搭了个原生的apache集群,但在standalone模式下没有任何问题,基于apache hadoop 2.7.3使用spark on yarn一直报这个错.(Jav ...
- Spark之Yarn提交模式
一.Client模式 提交命令: ./spark-submit --master yarn --class org.apache.examples.SparkPi ../lib/spark-examp ...
- spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)
不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...
- spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)(转)
不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...
- Spark on Yarn:任务提交参数配置
当在YARN上运行Spark作业,每个Spark executor作为一个YARN容器运行.Spark可以使得多个Tasks在同一个容器里面运行. 以下参数配置为例子: spark-submit -- ...
随机推荐
- JetBrains.DotMemory.Keygen.4.x
GoogleN久都没找到这一类的工具. 只好折腾一个了. CSDN 免积分 0积分 下载 http://download.csdn.net/detail/caizz520/7477865 全球首发-V ...
- 数据库排序sql,order by
一开始我认为 SELECT * FROM dbo.T_User ORDER BY CreateTime ,IsDel DESC 的执行顺序是先按创建时间倒序排序,再按isdel倒序排序,所以我就没再S ...
- SQL SERVER 的模糊查询 LIKE
今天写个动态脚本,需要把数据库里面包含“USER_"的表删除掉,突然想不起来如何搜索通配字符了,赶紧查查MSDN,整理了下模糊查询的知识点,留着以后查阅用. LIKE模糊查询的通配符 通配符 ...
- 背水一战 Windows 10 (37) - 控件(弹出类): MessageDialog, ContentDialog
[源码下载] 背水一战 Windows 10 (37) - 控件(弹出类): MessageDialog, ContentDialog 作者:webabcd 介绍背水一战 Windows 10 之 控 ...
- 一个struts2登录bug的解决
点登录的时候,在url后面总会加上一个;jsessionid=xxx 使找不到页面 的404 Bug ,百思不得其解,最后终于找到解决方案,实验最终成功解决了这个bug,下面是解决方案 1,增加依赖 ...
- luogg_java学习_07_抽象类_接口_多态学习总结
这篇博客总结了半天,希望自己以后返回来看的时候理解更深刻,也希望可以起到帮助初学者的作用. 转载请注明 出自 : luogg的博客园 , 抽象 一种专门用来做父类,被继承的. (模板) 格式: abs ...
- linux下c程序的链接、装载和库(2)
5. 重定义错误. 一个最终的可执行文件里,绝对不允许出现两个同名的全局变量,也不允许出现同名的全局函数. 全局函数:只要不用 static 修饰符修饰的函数,全部都是全局的. 全局变量:函数外声明定 ...
- ASP.NET MVC+EF框架+EasyUI实现权限管理系列(22)-为用户设置角色
ASP.NET MVC+EF框架+EasyUI实现权限管系列 (开篇) (1):框架搭建 (2):数据库访问层的设计Demo (3):面向接口编程 (4 ):业务逻辑层的封装 ...
- jquery插件图片延时加载实例详解
效果预览:http://keleyi.com/keleyi/phtml/image/index.htm 使用方法:1.导入JS插件 <script src="http://keleyi ...
- angular源码分析:angular中脏活累活的承担者之$interpolate
一.首先抛出两个问题 问题一:在angular中我们绑定数据最基本的方式是用两个大括号将$scope的变量包裹起来,那么如果想将大括号换成其他什么符号,比如换成[{与}],可不可以呢,如果可以在哪里配 ...