问题一:

18/03/15 07:59:23 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1521099425266_0002 failed 2 times due to AM Container for appattempt_1521099425266_0002_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://spark1:8088/proxy/application_1521099425266_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1521099425266_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

  

此问题一般和内存有关,调大内存

再把虚拟和物理监控线程关闭

	<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

  

问题二:

Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.kfk
start time: 1521115132862
final status: FAILED
tracking URL: http://spark1:8088/cluster/app/application_1521099425266_0002
user: kfk
Exception in thread "main" org.apache.spark.SparkException: Application application_1521099425266_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/03/15 07:59:23 INFO util.ShutdownHookManager: Shutdown hook called
18/03/15 07:59:23 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-edf48e42-1bda-41b6-8a1b-7f9e176da728

  

此问题一般是由于集群配置原因,检查jdk ,yarn 的配置文件

问题三:

diagnostics: Application application_1521099425266_0004 failed 2 times due to Error launching appattempt_1521099425266_0004_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1521213771615 found 1521138303131
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

  

同步集群的时间即可,本人集群其实一直都是时钟同步的,但是不知道为什么第三个节点会突然时钟错乱,jdk版本也错乱了

问题问题四:

Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
2018-03-16 11:59:29,345 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1521214648009_0003 State change from FINAL_SAVING to FAILED
2018-03-16 11:59:29,346 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1521214648009_0003 failed 2 times due to AM Container for appattempt_1521214648009_0003_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://spark2:8088/proxy/application_1521214648009_0003/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1521214648009_0003_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 15
Failing this attempt. Failing the application. APPID=application_1521214648009_0003
2018-03-16 11:59:29,346 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1521214648009_0003,name=com.spark.test.MyScalaWordCout,user=kfk,queue=root.kfk,state=FAILED,trackingUrl=http://spark2:8088/cluster/app/application_1521214648009_0003,appMasterHost=N/A,startTime=1521215923660,finishTime=1521215968592,finalStatus=FAILED
2018-03-16 11:59:30,164 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2018-03-16 12:00:15,892 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6667ms for sessionid 0x3622d0b65080001, closing socket connection and attempting reconnect
2018-03-16 12:00:15,996 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2018-03-16 12:00:15,996 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session disconnected
2018-03-16 12:00:16,123 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server spark1/192.168.208.151:2181. Will not attempt to authenticate using SASL (unknown error)
2018-03-16 12:00:17,199 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6670ms for sessionid 0x1622882ae9c0001, closing socket connection and attempting reconnect
2018-03-16 12:00:17,301 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
2018-03-16 12:00:17,838 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server spark3/192.168.208.153:2181. Will not attempt to authenticate using SASL (unknown error)
2018-03-16 12:00:18,838 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.208.152:35089, server: spark3/192.168.208.153:2181
2018-03-16 12:00:18,843 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server spark3/192.168.208.153:2181, sessionid = 0x1622882ae9c0001, negotiated timeout = 10000
2018-03-16 12:00:18,844 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2018-03-16 12:00:18,858 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0272731203726d32
2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it.
2018-03-16 12:00:18,862 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/rs/ActiveBreadCrumb to indicate that the local node is the most recent active...
2018-03-16 12:00:19,127 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.208.152:50168, server: spark1/192.168.208.151:2181
2018-03-16 12:00:21,384 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server spark1/192.168.208.151:2181, sessionid = 0x3622d0b65080001, negotiated timeout = 10000
2018-03-16 12:00:21,386 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/yarn-site.xml
2018-03-16 12:00:21,387 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2018-03-16 12:00:21,387 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected
2018-03-16 12:00:21,387 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored
2018-03-16 12:00:21,406 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,407 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in active state
2018-03-16 12:00:21,407 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshQueues TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,408 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/yarn-site.xml
2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to
2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to
2018-03-16 12:00:21,426 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2018-03-16 12:00:21,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,432 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/core-site.xml
2018-03-16 12:00:21,432 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/yarn-site.xml
2018-03-16 12:00:21,450 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshSuperUserGroupsConfiguration TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,450 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/opt/modules/hadoop-2.6.0-cdh5.4.5/etc/hadoop/core-site.xml
2018-03-16 12:00:21,451 INFO org.apache.hadoop.security.Groups: clearing userToGroupsMap cache
2018-03-16 12:00:21,451 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=refreshUserToGroupsMappings TARGET=AdminService RESULT=SUCCESS
2018-03-16 12:00:21,451 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kfk OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=SUCCESS

  这些问题看表面一般看不出来,在yarn的日志里面可以查看具体日志

问题五:

Exception in thread "main" org.apache.spark.SparkException: Application application_1521293577934_0006 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

  这只是个表面错误,实际错误找到资源调度列表中的错误任务,点击进去发现实际错误

Diagnostics:	User class threw exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ns/opt/datas/stu2.txt

  

spark on yarn 运行问题记录的更多相关文章

  1. Spark on Yarn运行时加载的jar包

    spark on yarn运行时会加载的jar包有如下: spark-submit中指定的--jars $SPARK_HOME/jars下的jar包 yarn提供的jar包 spark-submit通 ...

  2. Spark on Yarn运行错误:Yarn application has already ended! It might have been killed or unable to launch application master

    Spark on Yarn模式运行错误: bin/spark-shell --master yarn --deploy-mode client #报错 ​ 查看8088页面上的工作日志 错误原因:在执 ...

  3. 大话Spark(2)-Spark on Yarn运行模式

    Spark On Yarn 有两种运行模式: Yarn - Cluster Yarn - Client 他们的主要区别是: Cluster: Spark的Driver在App Master主进程内运行 ...

  4. Spark on YARN运行模式(图文详解)

    不多说,直接上干货! 请移步 Spark on YARN简介与运行wordcount(master.slave1和slave2)(博主推荐) Spark on YARN模式的安装(spark-1.6. ...

  5. spark on yarn运行产生jar包冲突问题

    1.1 问题描述 Spark Streaming程序解析protobuf序列化的数据时,--jars 来添加依赖的protobuf-java-3.0.0.jar包,使用local模式程序正常,使用ya ...

  6. Spark On Yarn搭建及各运行模式说明

    之前记录Yarn:Hadoop2.0之YARN组件,这次使用Docker搭建Spark On  Yarn 一.各运行模式 1.单机模式 该模式被称为Local[N]模式,是用单机的多个线程来模拟Spa ...

  7. Spark on YARN简介与运行wordcount(master、slave1和slave2)(博主推荐)

    前期博客 Spark on YARN模式的安装(spark-1.6.1-bin-hadoop2.6.tgz +hadoop-2.6.0.tar.gz)(master.slave1和slave2)(博主 ...

  8. 【转载】Spark系列之运行原理和架构

    参考 http://www.cnblogs.com/shishanyuan/p/4721326.html 1. Spark运行架构 1.1 术语定义 lApplication:Spark Applic ...

  9. 六、yarn运行模式

    简介 spark的yarn运行模式根据Driver在集群中的位置分成两种: 1)yarn-client 客户端模式 2)yarn-cluster 集群模式 yarn模式和standalone模式不同, ...

随机推荐

  1. Mac 中配置Apache

    使用的mac版本是10.10.1,mac自带的Apache环境 分为两部分: 1.启动Apache 2.设置虚拟主机 启动Apache 打开终端, >>sudo apachectl -v, ...

  2. php 数组对象之间的转换

    在之前我写过php返回json数据简单实例 从5.2版本开始,PHP原生提供json_encode()和json_decode()函数,前者用于编码,后者用于解码. 一.json_encode() 1 ...

  3. javascript实现图片延迟加载方法汇总(三种方法)

    看到一些大型网站,页面如果有很多图片的时候,当你滚动到相应的行时,当前行的图片才即时加载的,这样子的话页面在打开只加可视区域的图片,而其它隐藏的图片则不加载,一定程序上加快了页面加载的速度,跟着小编一 ...

  4. Java进阶篇(二)——抽象类、内部类

    之前在类和对象中我们说到了类的普通特性,本篇将介绍类的一些高级特性. 一.抽象类 抽象类:抽象类是只声明方法的存在而不去具体实现它的类.抽象类不能被实例化,也就是不能创建其对象.使用abstract关 ...

  5. 洛谷 P2590 [ZJOI2008]树的统计(树链剖分)

    题目描述一棵树上有n个节点,编号分别为1到n,每个节点都有一个权值w. 我们将以下面的形式来要求你对这棵树完成一些操作: I. CHANGE u t : 把结点u的权值改为t II. QMAX u v ...

  6. BeautifulSoup 用法

    一.标签选择器 1.子节点contents ,child(迭代器), 2.子孙节点 descendants(迭代器) 3.父节点 parent 4.祖节点  parents 5.兄弟节点 next_s ...

  7. [LeetCode] The Maze II 迷宫之二

    There is a ball in a maze with empty spaces and walls. The ball can go through empty spaces by rolli ...

  8. 基于webpack的React项目搭建(三)

    前言 搭建好前文的开发环境,已经可以进行开发.然而实际的项目中,不同环境有着不同的构建需求.这里就将开发环境和生产环境的配置单独提取出来,并做一些简单的优化. 分离不同环境公有配置 不同环境虽然有不同 ...

  9. 因数(factor)

    一个最基本的算数法则就是大于1的整数都能用1个或多个素数相乘的形式表示出来.当然,有多种质因子排列方案 如: 10=2×5=5×2    20=5×2×2=2×5×2=2×2×5 用f(k)表示k的质 ...

  10. ●BZOJ 3926 [Zjoi2015]诸神眷顾的幻想乡

    题链: http://www.lydsy.com/JudgeOnline/problem.php?id=3926题解&&代码: 后缀自动机,Trie树 如果以每个叶子为根,所有的子串一 ...