Hadoop常见异常及其解决方式
1、Shell$ExitCodeException
现象:执行hadoop job时出现例如以下异常:
14/07/09 14:42:50 INFO mapreduce.Job: Task Id : attempt_1404886826875_0007_m_000000_1, Status : FAILED
Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Container exited with a non-zero exit code 1
原因及解决的方法:原因未知。
重新启动可恢复正常
2、libhadoop.so.1.0.0 which might have disabled stack guard
现象:Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.
export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/jediael/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
3、Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s)
在datanode上运行hdfs相关命令时,出现下面错误:
[jediael@slave156 ~]$ hadoop fs -ls /
14/08/31 15:00:37 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:38 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:39 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:40 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:41 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:42 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:43 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:44 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:45 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:46 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to master166/10.252.48.166:9000 failed on connection exception: java.net.ConnectException: Connection refused
出现以上错误,通常都是因为datanode无法连接到namenode所致,下面是一种情况:
/etc/hosts中存在127.0.0.1 *****的配置,如
127.0.0.1 localhost
将这些配置去掉,然后又一次格式化namenode。并重新启动hadoop进程就可以解决。
或者是下面原因:
hadoop安装完毕后,必需要用haddop namenode format格式化后,才干使用,假设重新启动机器
在启动hadoop后,用hadoop fs -ls命令老是报 10/09/25 18:35:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).的错误,
用jps命令。也看不不到namenode的进程。 必须再用命令hadoop namenode format格式化后。才干再使用
原因是:hadoop默认配置是把一些tmp文件放在/tmp文件夹下,重新启动系统后,tmp文件夹下的东西被清除。所以报错
解决方法:在conf/core-site.xml 中添加下面内容
<property>
<name>hadoop.tmp.dir</name>
<value>/var/log/hadoop/tmp</value>
<description>A base for other temporary directories</description>
</property>
重新启动hadoop后,格式化namenode就可以
4、Permission denied: user=liaoliuqing, access=WRITE, inode="":jediael:supergroup:rwxr-xr-x
原由于用户权限不足,能能訪写HDFS中的文件。
解决方式:
关闭hadoop权限。在hdfs-site.xml文件里加入
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
5、Incompatible namespaceIDs
2015-02-02 15:10:57,526 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-02-02 15:10:57,544 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2015-02-02 15:10:57,699 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2015-02-02 15:10:58,090 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /mnt/tmphadoop/dfs/data: namenode namespaceID = 2017454015; datanode namespaceID = 1238467850
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
问题原因:
每次namenode format会又一次创建一个namenodeId,而${hadoop.tmp.dir}/dfs/data下包括了上次format下的id,当又一次运行namenode format时清空了namenode下的数据,可是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致,从而导致从现上述异常,启动失败。
解决的方法:
(1)停止hadoop
stop-all.sh
(2)在各个slave中删除dfs.data.dir中的内容。
若此属性未改动。则其默认值为
<property>
<name>${dfs.data.dir}</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
(3)又一次格式化namenode
hadoop namenode -format
然后start-all.sh启动hadoop
以上解决的方法须要将原有数据删除,若数据不能删除。则使用下面方法之中的一个:
(1)改动${dfs.data.dir}/current/VERSION文件。将datanode中的id改成与namenode中的id一致。
(2)改动${dfs.data.dir}
Hadoop常见异常及其解决方式的更多相关文章
- dubbo常见异常及解决方式
1.出现RpcException: Failed to invoke the method post in the service com.xxx.xxx.xxx异常怎么办?表示调用失败1.检查网络是 ...
- Android 常见异常及解决办法
Ø 前言 本文主要记录 Android 的常见异常及解决办法,以备以后遇到相同问题时可以快速解决. 1. java.lang.NullPointerException: Attempt to i ...
- 连接db2数据库时NumberFormatException异常的解决方式
连接db2数据库时报异常:java.lang.NumberFormatException: For input string: "A" from a DB2 JDBC(JCC) j ...
- IIS 常见异常及解决办法
Ø 简介 IIS 是我们平常接触比较多的服务端软件,用于站点发布等,本文主要记录 IIS 常见的异常及解决办法.主要包括: 1. Visual Studio 启动 Web 项目提示"无 ...
- 【Android开发经验】Cannot generate texture from bitmap异常的解决方式
异常现象: 今天在处理用户头像的过程中,由于头像的处理比較复杂,由于,没有使用afinal自带的自己主动载入.而是自己依据头像的下载路径.手动进行下载和使用.可是在手动回收bitmap对象的过程中,会 ...
- Maven常见异常及解决方法(本篇停更至16-4-12)
本篇文章记录了老猫在学习整合Maven和SSH过程中遇到的问题,有的问题可以解决.有的问题还不能解决. 方法不一定适合全部的环境.但绝对是本人常遇到的常见异常.在这里做一个笔记和记录,也分享给大家,希 ...
- Android之——常见Bug及其解决方式
转载请注明出处:http://blog.csdn.net/l1028386804/article/details/46942139 1.android.view.WindowManager$BadTo ...
- Maven常见异常及解决方法
异常1: [ERROR] Failed to execute goal on project biz_zhuhai: Could not resolve dependencies for projec ...
- Hadoop常见异常及其解决方案
1.Shell$ExitCodeException 现象:运行hadoop job时出现如下异常: 14/07/09 14:42:50 INFO mapreduce.Job: Task Id : at ...
随机推荐
- class.getDeclaredFields()与class.getFields()
* getFields()与getDeclaredFields()区别:getFields()只能访问类中声明为公有的字段,私有的字段它无法访问.getDeclaredFields()能访问类中所有的 ...
- qemu相关命令使用
qemu-ga qemu-guest-agent-2.5.0-3.el7.x86_64 qemu-img qemu-img-1.5.3-105.el7_2.4.x86_64 qemu-io qemu- ...
- Codeforces Gym100812 L. Knights without Fear and Reproach-扩展欧几里得(exgcd)
补一篇以前的扩展欧几里得的题,发现以前写错了竟然也过了,可能数据水??? 这个题还是很有意思的,和队友吵了两天,一边吵一边发现问题??? L. Knights without Fear and Rep ...
- 对动态规划(Dynamic Programming)的理解:从穷举开始(转)
转自:http://janfan.cn/chinese/2015/01/21/dynamic-programming.html 动态规划(Dynamic Programming,以下简称dp)是算法设 ...
- luogu P1854 花店橱窗布置
题目描述 某花店现有F束花,每一束花的品种都不一样,同时至少有同样数量的花瓶,被按顺序摆成一行,花瓶的位置是固定的,从左到右按1到V顺序编号,V是花瓶的数目.花束可以移动,并且每束花用1到F的整数标识 ...
- 某考试 T1 fair (18.5.1版)
转化一下模型:每天可以选1也可以选0,但是任意前i天(i<=n)1的个数都必须>=0的个数,求总方案数/2^n. 然后可以发现这是一个经典题,随便推一下公式发现等于 C(n,n/2)/2 ...
- dedecms跳转标签
我们在使用织梦dedecms制作网站的时候,有时会遇到利用arclist和list标签调用redirecturl属性.但是,dedecms的arclist和list标签不支持redirecturl.很 ...
- 使用charles远程调试iOS移动应用
做iOS移动应用很多开发者会喜欢抓网络发包.回包来联调服务端借口以及定位其他网络问 题.如果在Windows系统可以使用fiddler来做iOS的远程代理,只要fiddler所在系统与iOS设备同时连 ...
- zfighting 的问题
1.对每个mesh 在脚本里加bias 由美术勾 {a. vertex shader b. depth bias slop depth bias rasterizateState} 2.inverse ...
- bbed初体验
bbed能够直接查看或改动数据文件.听起来非常强大,以下体验一下,安装方法网上一搜一大把,我的环境是centos+10G的 bbed參考文档:http://pan.baidu.com/s/1hqCC6 ...