spark on yarn 提交任务出错
Application ID is application_1481285758114_422243, trackingURL: http://***:4040
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://mycluster-tj/user/engine_arch/data/mllib/sample_libsvm_data.txt
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1994)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at org.apache.spark.mllib.util.MLUtils$.loadLibSVMFile(MLUtils.scala:105)
at org.apache.spark.mllib.util.MLUtils$.loadLibSVMFile(MLUtils.scala:134)
at org.apache.spark.ml.source.libsvm.LibSVMRelation.buildScan(LibSVMRelation.scala:49)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:135)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.rdd$lzycompute(DataFrame.scala:1638)
at org.apache.spark.sql.DataFrame.rdd(DataFrame.scala:1635)
at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1411)
at org.apache.spark.ml.feature.StandardScaler.fit(StandardScaler.scala:90)
at com.xiaoju.arch.engine.spark.SparkDD.buildStandardScaler(SparkDD.java:149)
at com.xiaoju.arch.engine.spark.StandardScalerDemo.main(StandardScalerDemo.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
spark中读取文件,在yarn模式下,必须将文件存入hdfs中,不然就会报错。单机模式下无此问题。
spark on yarn 提交任务出错的更多相关文章
- 【原创】大叔经验分享(19)spark on yarn提交任务之后执行进度总是10%
spark 2.1.1 系统中希望监控spark on yarn任务的执行进度,但是监控过程发现提交任务之后执行进度总是10%,直到执行成功或者失败,进度会突然变为100%,很神奇, 下面看spark ...
- spark利用yarn提交任务报:YARN application has exited unexpectedly with state UNDEFINED
spark用yarn提交任务会报ERROR cluster.YarnClientSchedulerBackend: YARN application has exited unexpectedly w ...
- 【原创】大叔经验分享(14)spark on yarn提交任务到集群后spark-submit进程一直等待
spark on yarn通过--deploy-mode cluster提交任务之后,应用已经在yarn上执行了,但是spark-submit提交进程还在,直到应用执行结束,提交进程才会退出,有时这会 ...
- Spark通过YARN提交任务不成功(包含YARN cluster和YARN client)
无论用YARN cluster和YARN client来跑,均会出现如下问题. [spark@master spark-1.6.1-bin-hadoop2.6]$ jps 2049 NameNode ...
- spark on yarn提交任务时报ClosedChannelException解决方案
spark2.1出来了,想玩玩就搭了个原生的apache集群,但在standalone模式下没有任何问题,基于apache hadoop 2.7.3使用spark on yarn一直报这个错.(Jav ...
- Spark之Yarn提交模式
一.Client模式 提交命令: ./spark-submit --master yarn --class org.apache.examples.SparkPi ../lib/spark-examp ...
- spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)
不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...
- spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)(转)
不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...
- Spark on Yarn:任务提交参数配置
当在YARN上运行Spark作业,每个Spark executor作为一个YARN容器运行.Spark可以使得多个Tasks在同一个容器里面运行. 以下参数配置为例子: spark-submit -- ...
随机推荐
- LINQ to SQL语句(9)之Top/Bottom和Paging和SqlMethods
适用场景:适量的取出自己想要的数据,不是全部取出,这样性能有所加强. Take 说明:获取集合的前n个元素:延迟.即只返回限定数量的结果集. var q = ( from e in db.Employ ...
- C# MVC绑定 List<DapperRow>到bootstrap-table列表
1.Dapper返回List<dynamic>对象 /// <summary> /// 获取候选人推荐的分页数据 /// </summary> /// <pa ...
- ASP.NET MVC+EF框架+EasyUI实现权限管理系列(24)-权限组的设计和实现(附源码)(终结)
ASP.NET MVC+EF框架+EasyUI实现权限管系列 (开篇) (1):框架搭建 (2):数据库访问层的设计Demo (3):面向接口编程 (4 ):业务逻辑层的封装 ...
- 使用Object.create 克隆对象以及实现单继承
var Plane = function () { this.blood = 100; this.attack = 1; this.defense = 1; }; var plane = new Pl ...
- 浅谈html5 响应式布局
一.什么是响应式布局? 响应式布局是Ethan Marcotte在2010年5月份提出的一个概念,简而言之,就是一个网站能够兼容多个终端——而不是为每个终端做一个特定的版本. 这个概念是为解决移动互联 ...
- 你真的了解UITabBarController吗?
一:首先查看一下关于UITabBarController的定义 NS_CLASS_AVAILABLE_IOS(2_0) @interface UITabBarController : UIViewCo ...
- python之路径导入
问题: 最近在学习import的时候,发现不像import xxx,或者from xxx import ooo 这样简单.比如,看下面这个图: 要导入才能在te.py调用pre.tab.py?? 直接 ...
- linux版基金看板
程序员的吊丝们,还在害怕上班时偷偷看基金被老板发现吗?今天你们的福利来了,专属程序员吊丝一族的礼物,linux版基金看板. 优点: 1.自定义设置关注基金 2.linux系统,让别人可以以为你一直都在 ...
- 将ASP.NET Core应用程序部署至生产环境中(CentOS7)
这段时间在使用Rabbit RPC重构公司的一套系统(微信相关),而最近相关检验(逻辑测试.压力测试)已经完成,接近部署至线上生产环境从而捣鼓了ASP.NET Core应用程序在CentOS上的部署方 ...
- SQL Server 2012安装错误案例:Error while enabling Windows feature: NetFx3, Error Code: -2146498298
案例环境: 服务器环境 : Windows Server 2012 R2 Standard 数据库版本 : SQL Server 2012 SP1 案例介绍: 在Windows Ser ...