1、Shell$ExitCodeException

现象:运行hadoop job时出现如下异常:

14/07/09 14:42:50 INFO mapreduce.Job: Task Id : attempt_1404886826875_0007_m_000000_1, Status : FAILED

Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 

org.apache.hadoop.util.Shell$ExitCodeException: 

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)

        at org.apache.hadoop.util.Shell.run(Shell.java:418)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:744)



Container exited with a non-zero exit code 1

原因及解决办法:原因未知。重启可恢复正常

2、libhadoop.so.1.0.0 which might have disabled stack guard

现象:Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.

原因及解决方法:
在/etc/profile中添加:
#hadoop configuration

export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin

export HADOOP_HOME=/home/jediael/hadoop-2.4.1

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
此警告出现的原因是最后2项未添加。

3、Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s)

在datanode上执行hdfs相关命令时,出现以下错误:

[jediael@slave156 ~]$ hadoop fs -ls /

14/08/31 15:00:37 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:38 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:39 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:40 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:41 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:42 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:43 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:44 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:45 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:46 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

ls: Call to master166/10.252.48.166:9000 failed on connection exception: java.net.ConnectException: Connection refused

出现以上错误,通常都是由于datanode无法连接到namenode所致,以下是一种情况:

/etc/hosts中存在127.0.0.1 *****的配置,如

127.0.0.1 localhost

将这些配置去掉,然后重新格式化namenode,并重启hadoop进程即可解决。

或者是以下原因:

hadoop安装完成后,必须要用haddop namenode format格式化后,才能使用,如果重启机器

在启动hadoop后,用hadoop fs -ls命令老是报 10/09/25 18:35:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).的错误,

用jps命令,也看不不到namenode的进程, 必须再用命令hadoop namenode format格式化后,才能再使用

原因是:hadoop默认配置是把一些tmp文件放在/tmp目录下,重启系统后,tmp目录下的东西被清除,所以报错

解决方法:在conf/core-site.xml 中增加以下内容

<property>

<name>hadoop.tmp.dir</name>

<value>/var/log/hadoop/tmp</value>

<description>A base for other temporary directories</description>

</property>

重启hadoop后,格式化namenode即可

4、Permission denied: user=liaoliuqing, access=WRITE, inode="":jediael:supergroup:rwxr-xr-x

原因为用户权限不足,能能访写HDFS中的文件。

解决方案:

关闭hadoop权限,在hdfs-site.xml文件中添加

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

5、Incompatible namespaceIDs

2015-02-02 15:10:57,526 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.

2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

2015-02-02 15:10:57,544 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started

2015-02-02 15:10:57,699 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.

2015-02-02 15:10:58,090 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /mnt/tmphadoop/dfs/data: namenode namespaceID = 2017454015; datanode namespaceID = 1238467850

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)



问题原因:

每次namenode format会重新创建一个namenodeId,而${hadoop.tmp.dir}/dfs/data下包含了上次format下的id,当重新执行namenode format时清空了namenode下的数据,但是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致,从而导致从现上述异常,启动失败。





解决办法:

(1)停止hadoop

 stop-all.sh

(2)在各个slave中删除dfs.data.dir中的内容。若此属性未修改,则其默认值为

<property>

  <name>${dfs.data.dir}</name>

  <value>${hadoop.tmp.dir}/dfs/data</value>

  <description>Determines where on the local filesystem an DFS data node

  should store its blocks.  If this is a comma-delimited

  list of directories, then data will be stored in all named

  directories, typically on different devices.

  Directories that do not exist are ignored.

  </description>

</property>

(3)重新格式化namenode

hadoop namenode -format

然后start-all.sh启动hadoop



以上解决办法需要将原有数据删除,若数据不能删除,则使用以下方法之一:

(1)修改${dfs.data.dir}/current/VERSION文件,将datanode中的id改成与namenode中的id一致。

(2)修改${dfs.data.dir}

Hadoop常见异常及其解决方案的更多相关文章

  1. Hadoop常见异常及其解决方案 分类: A1_HADOOP 2014-07-09 15:02 4187人阅读 评论(0) 收藏

    1.Shell$ExitCodeException 现象:运行hadoop job时出现如下异常: 14/07/09 14:42:50 INFO mapreduce.Job: Task Id : at ...

  2. ASP.NET MVC开发中常见异常及解决方案

    ASP.NET MVC4入门到精通系列目录汇总 NHibernate:no persister for 异常 1.配置文件后缀名写错 mapping file 必须是.hbm.xml结尾 2.Web. ...

  3. Hadoop常见异常及其解决方式

    1.Shell$ExitCodeException 现象:执行hadoop job时出现例如以下异常: 14/07/09 14:42:50 INFO mapreduce.Job: Task Id : ...

  4. Hadoop启动异常情况解决方案

    1. 启动时报WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using b ...

  5. 大数据平台常见异常-zookeeper

    本文主要阐述大数据平台环境zookeeper常见异常和解决方案 1.Connection reset by peer异常 异常说明 我们现在项目有个任务OneMinuteDataSync是用spark ...

  6. struts2.1.8+hibernate2.5.6+spring3.0(ssh2三大框架)常见异常原因和解决方案

    ---------------------------------------------------------------------------------------------------- ...

  7. flume常见异常汇总以及解决方案

    flume常见异常汇总以及解决方案 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 实际生产环境中,我用flume将kafka的数据定期的往hdfs集群中上传数据,也遇到过一系列的坑 ...

  8. Hadoop基础-常见异常剖析之防坑小技巧

    Hadoop基础-常见异常剖析之防坑小技巧 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.

  9. IIS网站部署步骤以及常见异常解决方案

    一. 简述 如果VS调试代码每次都使用浏览器打开,修改脚本和样式等还可以刷新页面显示最新修改,但是修改后台代码的话就需要停止调试再重新使用浏览器打开才能显示后台的最新修改,就比较麻烦.这里推荐附加到I ...

随机推荐

  1. Bootstrap学习笔记(未整理)

    强调class 这些class通过颜色来表示强调.也可以应用于链接,当鼠标盘旋于链接上时,其颜色会变深,就像默认的链接样式. <p class="text-muted"> ...

  2. 在windows下获取硬盘序列号(win7 32位,Windows Server 64位测试,希望在其他平台测试,遇到问题的网友留言分享)

    #include <Windows.h> #include <stdio.h> // IOCTL控制码 // #define DFP_SEND_DRIVE_COMMAND CT ...

  3. 观《Terminal》之感

    读书笔记系列链接地址http://www.cnblogs.com/shoufengwei/p/5714661.html.        经人推荐,用了几天时间欣赏了这部斯皮尔伯格导演的电影<Te ...

  4. 根据样式获取被选中的checkbox

    <ul id="> </div> </li></ul> $("#btn_CheckManagerCompletionCreateAt ...

  5. 转载--http协议学习和总结

    http的了解一直停留在一知半解的程度,今天看到阿蜜果大大的博客,果断学习了,这里做个转载,希望阿蜜果大大不要怪罪~~ 3.1 Cookie和Session Cookie和Session都为了用来保存 ...

  6. Delphi中ShellExecute使用详解(详细解释10种显示状态)

    有三个API函数可以运行可执行文件WinExec.ShellExecute和CreateProcess.1.CreateProcess因为使用复杂,比较少用.2.WinExec主要运行EXE文件.如: ...

  7. Android-PullToRefresh 使用心得

    目前下拉刷新已经满大街都是,在自己的应用如果不使用这个模式的话,出门都不好意思和人家打招呼,该文章就是简单探讨下针对于 github 上的这个开源项目的使用心得. 为什么是它?因为在 stackove ...

  8. bzoj3393 [Usaco2009 Jan]Laserphones 激光通讯

    Description Input 第1行输入w和H,之后W行H列输入地图,图上符号意义如题目描述. Output 最少的对角镜数量. Sample Input 7 8 ....... ...... ...

  9. 事件处理原理(IOS篇) by sixleaves

    前言 了解IOS事件处理的本质关键要先掌握几个概念.首先是事件的派发(Event Delivery)的过程, 一个是响应者链条如何构成. 事件的派发: Q1: 你有没有想过,如果你一个屏幕中有多个的V ...

  10. 本地plsqldev.exe连接远端oracle数据库

    先看百度经验:http://jingyan.baidu.com/article/48b558e3540ecf7f38c09a3c.html 这里如果我们只有安装plsql工具,下载oracle精简版本 ...