1、Shell$ExitCodeException

现象:执行hadoop job时出现例如以下异常:

14/07/09 14:42:50 INFO mapreduce.Job: Task Id : attempt_1404886826875_0007_m_000000_1, Status : FAILED

Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 

org.apache.hadoop.util.Shell$ExitCodeException: 

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)

        at org.apache.hadoop.util.Shell.run(Shell.java:418)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:744)



Container exited with a non-zero exit code 1

原因及解决的方法:原因未知。

重新启动可恢复正常

2、libhadoop.so.1.0.0 which might have disabled stack guard

现象:Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.

原因及解决方法:
在/etc/profile中加入:
#hadoop configuration

export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin

export HADOOP_HOME=/home/jediael/hadoop-2.4.1

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
此警告出现的原因是最后2项未加入。

3、Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s)

在datanode上运行hdfs相关命令时,出现下面错误:

[jediael@slave156 ~]$ hadoop fs -ls /

14/08/31 15:00:37 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:38 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:39 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:40 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:41 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:42 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:43 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:44 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:45 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

14/08/31 15:00:46 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

ls: Call to master166/10.252.48.166:9000 failed on connection exception: java.net.ConnectException: Connection refused

出现以上错误,通常都是因为datanode无法连接到namenode所致,下面是一种情况:

/etc/hosts中存在127.0.0.1 *****的配置,如

127.0.0.1 localhost

将这些配置去掉,然后又一次格式化namenode。并重新启动hadoop进程就可以解决。

或者是下面原因:

hadoop安装完毕后,必需要用haddop namenode format格式化后,才干使用,假设重新启动机器

在启动hadoop后,用hadoop fs -ls命令老是报 10/09/25 18:35:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).的错误,

用jps命令。也看不不到namenode的进程。 必须再用命令hadoop namenode format格式化后。才干再使用

原因是:hadoop默认配置是把一些tmp文件放在/tmp文件夹下,重新启动系统后,tmp文件夹下的东西被清除。所以报错

解决方法:在conf/core-site.xml 中添加下面内容

<property>

<name>hadoop.tmp.dir</name>

<value>/var/log/hadoop/tmp</value>

<description>A base for other temporary directories</description>

</property>

重新启动hadoop后,格式化namenode就可以

4、Permission denied: user=liaoliuqing, access=WRITE, inode="":jediael:supergroup:rwxr-xr-x

原由于用户权限不足,能能訪写HDFS中的文件。

解决方式:

关闭hadoop权限。在hdfs-site.xml文件里加入

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

5、Incompatible namespaceIDs

2015-02-02 15:10:57,526 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.

2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

2015-02-02 15:10:57,544 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started

2015-02-02 15:10:57,699 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.

2015-02-02 15:10:58,090 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /mnt/tmphadoop/dfs/data: namenode namespaceID = 2017454015; datanode namespaceID = 1238467850

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)



问题原因:

每次namenode format会又一次创建一个namenodeId,而${hadoop.tmp.dir}/dfs/data下包括了上次format下的id,当又一次运行namenode format时清空了namenode下的数据,可是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致,从而导致从现上述异常,启动失败。





解决的方法:

(1)停止hadoop

 stop-all.sh

(2)在各个slave中删除dfs.data.dir中的内容。

若此属性未改动。则其默认值为

<property>

  <name>${dfs.data.dir}</name>

  <value>${hadoop.tmp.dir}/dfs/data</value>

  <description>Determines where on the local filesystem an DFS data node

  should store its blocks.  If this is a comma-delimited

  list of directories, then data will be stored in all named

  directories, typically on different devices.

  Directories that do not exist are ignored.

  </description>

</property>

(3)又一次格式化namenode

hadoop namenode -format

然后start-all.sh启动hadoop



以上解决的方法须要将原有数据删除,若数据不能删除。则使用下面方法之中的一个:

(1)改动${dfs.data.dir}/current/VERSION文件。将datanode中的id改成与namenode中的id一致。

(2)改动${dfs.data.dir}

Hadoop常见异常及其解决方式的更多相关文章

  1. dubbo常见异常及解决方式

    1.出现RpcException: Failed to invoke the method post in the service com.xxx.xxx.xxx异常怎么办?表示调用失败1.检查网络是 ...

  2. Android 常见异常及解决办法

    Ø  前言 本文主要记录 Android 的常见异常及解决办法,以备以后遇到相同问题时可以快速解决. 1.   java.lang.NullPointerException: Attempt to i ...

  3. 连接db2数据库时NumberFormatException异常的解决方式

    连接db2数据库时报异常:java.lang.NumberFormatException: For input string: "A" from a DB2 JDBC(JCC) j ...

  4. IIS 常见异常及解决办法

    Ø  简介 IIS 是我们平常接触比较多的服务端软件,用于站点发布等,本文主要记录 IIS 常见的异常及解决办法.主要包括: 1.   Visual Studio 启动 Web 项目提示"无 ...

  5. 【Android开发经验】Cannot generate texture from bitmap异常的解决方式

    异常现象: 今天在处理用户头像的过程中,由于头像的处理比較复杂,由于,没有使用afinal自带的自己主动载入.而是自己依据头像的下载路径.手动进行下载和使用.可是在手动回收bitmap对象的过程中,会 ...

  6. Maven常见异常及解决方法(本篇停更至16-4-12)

    本篇文章记录了老猫在学习整合Maven和SSH过程中遇到的问题,有的问题可以解决.有的问题还不能解决. 方法不一定适合全部的环境.但绝对是本人常遇到的常见异常.在这里做一个笔记和记录,也分享给大家,希 ...

  7. Android之——常见Bug及其解决方式

    转载请注明出处:http://blog.csdn.net/l1028386804/article/details/46942139 1.android.view.WindowManager$BadTo ...

  8. Maven常见异常及解决方法

    异常1: [ERROR] Failed to execute goal on project biz_zhuhai: Could not resolve dependencies for projec ...

  9. Hadoop常见异常及其解决方案

    1.Shell$ExitCodeException 现象:运行hadoop job时出现如下异常: 14/07/09 14:42:50 INFO mapreduce.Job: Task Id : at ...

随机推荐

  1. 一个.java文件定义多个类的情况

    一个.java文件中定义多个类: 注意一下几点: (1) public权限类只能有一个(也可以一个都没有,但最多只有一个): (2)这个.java文件名只能是public 权限的类的类名: (3)倘若 ...

  2. Android修改包名的方法,简单粗暴。

    几分钟之内,简单粗暴的修改包名! 序:Android的新手玩家可能对修改包名这件事情很是烦恼,我这里给出一个最快的修改包名的方法,简单粗暴,喜欢的可以收藏一下. 开始修改 第一步:修改自己app mo ...

  3. Codeforces 899 C.Dividing the numbers-规律

      C. Dividing the numbers   time limit per test 1 second memory limit per test 256 megabytes input s ...

  4. 洛谷—— P2884 [USACO07MAR]每月的费用Monthly Expense

    https://www.luogu.org/problemnew/show/P2884 题目描述 Farmer John is an astounding accounting wizard and ...

  5. Light oj 1125 - Divisible Group Sums (dp)

    题目链接:http://www.lightoj.com/volume_showproblem.php?problem=1125 题意: 给你n个数,q次询问,每次询问问你取其中m个数是d的整数倍的方案 ...

  6. POJ 2486 Apple Tree [树状DP]

    题目:一棵树,每个结点上都有一些苹果,且相邻两个结点间的距离为1.一个人从根节点(编号为1)开始走,一共可以走k步,问最多可以吃多少苹果. 思路:这里给出数组的定义: dp[0][x][j] 为从结点 ...

  7. jersey上传文件解决办法

    这两天在使用jersey 构建的jersey JAX-RS REST服务器,在通过POST方法上传文件的时候,如果根据example来操作的话会引发如下异常: SEVERE: Missing depe ...

  8. 如何部署和运行Scut服务器及游戏:Windows篇

    概述 Scut游戏引擎是一个永久免费的全脚本游戏服务器框架,采用MVC框架设计,简化数据库设计和编码工作:降低对开发人员的开发难度:同时提供了丰富的类库和API接口. 一.    安装环境 必须安装的 ...

  9. Error Code: 1055 incompatible with sql_mode=only_full_group_by

    OperationalError at / (1055, "Expression #1 of ORDER BY clause is not in GROUP BY clause and co ...

  10. WEB API 返回类型设置为JSON 【转】

    http://blog.sina.com.cn/s/blog_60ba16ed0102uzc7.html web api写api接口时默认返回的是把你的对象序列化后以XML形式返回,那么怎样才能让其返 ...