首记

感觉Hadoop是一个坑,打着大数据最佳解决方案的旗帜到处坑害良民。记得以前看过一篇文章,说1TB以下的数据就不要用Hadoop了,体现不 出太大的优势,有时候反而会成为累赘。因此Hadoop的使用场所一般有两:一是有一定规模的公司,数据流一般是TB级别的,这样的公司其实不多;二是各 大高校的实验室,作为研究使用。不幸的我也走上了这条路,仅为研究之用。而且我的使用需求还不是一般的在Hadoop下开发应用程序,而是开发好的C++ 程序要放到Hadoop平台下进行测试。Hadoop是基于Java的数据计算平台,当然对Java支持的最好,如果要运行C++程序,有三种解决方案:

  • 使用JNI/JNA/JNR技术。这三种Java外部函数接口技术都是解决在Java程序中运行C++功能函数的需求,从而使得在Hadoop平 台下开发Java程序且能调用C++函数完成在Hadoop Java版应用中运行C++程序的目的。早期(大概11年)阿里就是使用该技术将C语言实现 的分词软件成功部署到Hadoop平台下运行,详情请看参考资料1。用过JNI的都知道JNI实在不好用,所以后来有人开发了另外两种Java外部函数接口,即JNA和JNR,具体介绍和使用实例请看参考资料2参考资料3
  • 使 用Hadoop Streaming技术。这项技术可以使得除了Java之外的多种其它语言如C/C++/Python/C#甚至shell脚本等运行在 Hadoop平台下,程序只需要按照一定的格式从标准输入读取数据、向标准输出写数据就可以在Hadoop平台上使用,原有的单机程序稍加改动就可以在 Hadoop平台进行分布式处理。
  • 使用Hadoop Pipes技术。该技术只专注于在Hadoop平台下运行C++程序,只允许用户 使用C++语言进行MapReduce程序设计。它采用的主要方法是将应用逻辑相关的C++代码放在单独的进程中,然后通过Socket让Java代码与 C++代码通信。从很大程度上说,这种方法类似于Hadoop Streaming,不同之处是通信方式不同:一个是标准输入输出,另一个是socket。

对于这三种技术我都做了相关调研和比较分析,首先排除了方法1,因为我不想写Java程序来调用C++功能,有些累赘,调试非常不方便,那么是用Hadoop Streaming技术还是Hadoop Pipes技术呢?两种方式各有优缺点,具体可看参考资料4(说的不太准确,仅供参考),为了准确选择,我需要将两种方法都试验一下,看哪个适合自己的需求再做最终决定。

首先选择了仅专注C++的Hadoop Pipes技术,于是就开启了下面一系列的过程……

Hadoop 2.3环境配置安装

我之前配置过Hadoop环境(看参考资料5),但那时用的版本是1.1.2,比较老的版本,一堆bug,为了避免遗留bug的困扰我选择了最新版2.3,因此需要重新配置(主要借鉴参考资料6),由于2.3版本使用的是新MapReduce框架yarn,因此配置与之前有所差异。

配置成功后,用自带的经典案例wordcount测试下运行是否正常(已上传数据):

hadoop jar ./hadoop-2.3./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3..jar wordcount wc_input.txt out

结果出现了下面错误:

-- ::, FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From Slave1/192.168.1.152 to 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:)
Caused by: java.net.ConnectException: Call From Slave1/192.168.1.152 to 0.0.0.0: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:)
at java.lang.reflect.Constructor.newInstance(Constructor.java:)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:)
at com.sun.proxy.$Proxy23.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:)
at com.sun.proxy.$Proxy24.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:)
... more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:)
at org.apache.hadoop.ipc.Client$Connection.access$(Client.java:)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
... more

是连接拒绝问题,连接了“0.0.0.0:8031”这个地址,这显然是某个选项没配置连接地址,导致使用了默认的错误地址。经过查资料,是yarn-site.xml文件没配置好,其中的yarn.nodemanager.address没有配置,默认是0.0.0.0。正确的配置如下(可能有些项不需要配置,但为了保险还是大部分都配置了)

<configuration>
<property>
  <name>yarn.resourcemanager.address</name>
  <value>192.168.1.137:</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>192.168.1.137:</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>192.168.1.137:8031</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>192.168.1.137:</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
  <value>192.168.1.137:</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

然后就运行成功了。开始以为环境就这样配好了,直到我测试Hadoop pipes程序……

Hadoop pipes运行错误集锦

Hadoop 2.3发布版本没有自带Hadoop pipes的wordcount例子,于是我就在老的1.1.2版本中找来了wordcount-simple.cc例子,修改了头文件路径(新版本与旧版本路径完全不一样),并参考资料7使用下面的makefile编译(其中的-lssl也许非必需):

HADOOP_INSTALL=/opt/hadoop

CC = g++
CCFLAGS = -I$(HADOOP_INSTALL)/include wordcount :wordcount-simple.cc
$(CC) $(CCFLAGS) $< -Wall -L$(HADOOP_INSTALL)/lib/native -lhadooppipes -lhadooputils -lpthread -lcrypto -lssl -g -O2 -o $@

编译成功后,上传到HDFS上,用下面的命令测试运行:

hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true  \
-D mapred.job.name=wordcount -input /data/wc_in -output /data/wc_out2 -program /bin/wordcount

开始一直卡在”map 0% reduce 0%“,不知道过了多久,报出了下面一系列错误:

DEPRECATED: Use of this script to execute mapred command is deprecated.

Instead use the mapred command for it.
// :: INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.137:
// :: INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.137:
// :: WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
// :: INFO mapred.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO Configuration.deprecation: hadoop.pipes.java.recordreader is deprecated. Instead, use mapreduce.pipes.isjavarecordreader
// :: INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
// :: INFO Configuration.deprecation: hadoop.pipes.java.recordwriter is deprecated. Instead, use mapreduce.pipes.isjavarecordwriter
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1396578697573_0004
// :: INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
// :: INFO impl.YarnClientImpl: Submitted application application_1396578697573_0004
// :: INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1396578697573_0004/
// :: INFO mapreduce.Job: Running job: job_1396578697573_0004
// :: INFO mapreduce.Job: Job job_1396578697573_0004 running in uber mode : false
// :: INFO mapreduce.Job: map 0% reduce 0%
14/04/04 00:10:53 INFO mapreduce.Job: map 100% reduce 0%
14/04/04 00:10:53 INFO mapreduce.Job: Task Id : attempt_1396578697573_0004_m_000001_0, Status : FAILED
AttemptID:attempt_1396578697573_0004_m_000001_0 Timed out after 600 secs
// :: INFO mapreduce.Job: Task Id : attempt_1396578697573_0004_m_000000_0, Status : FAILED
AttemptID:attempt_1396578697573_0004_m_000000_0 Timed out after secs
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Task Id : attempt_1396578697573_0004_m_000000_1, Status : FAILED
AttemptID:attempt_1396578697573_0004_m_000000_1 Timed out after secs
// :: INFO mapreduce.Job: Task Id : attempt_1396578697573_0004_m_000001_1, Status : FAILED
AttemptID:attempt_1396578697573_0004_m_000001_1 Timed out after secs
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Task Id : attempt_1396578697573_0004_m_000000_2, Status : FAILED
AttemptID:attempt_1396578697573_0004_m_000000_2 Timed out after secs
// :: INFO mapreduce.Job: Task Id : attempt_1396578697573_0004_m_000001_2, Status : FAILED
AttemptID:attempt_1396578697573_0004_m_000001_2 Timed out after secs
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1396578697573_0004 failed with state FAILED due to: Task failed task_1396578697573_0004_m_000000
Job failed as tasks failed. failedMaps: failedReduces:
// :: INFO mapreduce.Job: Counters:
Job Counters
Failed map tasks=
Launched map tasks=
Other local map tasks=
Data-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total vcore-seconds taken by all map tasks=
Total megabyte-seconds taken by all map tasks=
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:)

通过网上各种资料(之所以在网上找解决方案是因为在namenode、datanode及yarn的日志里都没能找到错误信息及有效信息,所以很不解到底 是什么原因导致的错误),解决方案都不靠谱,都没能解决该问题。最后自己一阵捣鼓,最终在一个很隐蔽的日志里看到了错误提示,该日志在datanode的Hadoop安装路径下logs目录下的userlogs文 件夹里,开始以为这是个无用的文件夹,后来发现这个文件夹解决了我所有的问题,该文件夹下面包含了很多application子文件夹,而子文件夹下面又 包含了很多container子文件夹,container下面就是标准的三个输出日志”stderr“、”stdout“和”syslog“,也就是这 样的目录结构(datanode的container logs):

|-- Hadoop安装路径

|-- logs

|- -userlogs

|-- application_XXXXXXXX

|-- container_XXXXXXXX

   |-- stderr

|-- stdout

|-- syslog

其中stderr是最重要的文件,具体的错误原因就保存在这个文件里面。比如上面的错误信息在该文件里找到了这样的错误提示(...表示省略无关信息):

.../application_1396607014314_0001/container_1396607014314_0001_01_000002/wordcount:
error while loading shared libraries: libcrypto.so.10: cannot open shared object file: No such file or directory

在datanode上用”locate libcrypto.so.10“搜了下的确没找到,但在namenode上找到了这个共享库文件,于是就将 namenode下的该文件拷贝到了datanode的/usr/lib下面,重新运行程序,还是同样的问题,卡住了最后报出了那样的错误,于是到 datanode下面的container logs里找原因,发现了新的问题:

/usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by .../wordcount)

同样的还是namenode里有而datanode里没有,因此还是将namenode的libstdc++.so.6拷过去了,再次运行,错误依旧,这时原因是:

error while loading shared libraries: /usr/lib/libstdc++.so.6: ELF file OS ABI invalid

这时候我没有去搜相应的解决方案,我似乎知道了问题的症结所在,每次问题都是datanode没有相关的库文件而namenode都有,将namenode上的拷过去还可能造成与系统不匹配无效的后果,那么问题就应该出在:namenode(master)和datanode(slave)云节点系统内核的不匹配。 我的master节点是Ubuntu 13.04而slave节点是特别旧的Ubuntu 9.11,虽然都是Ubuntu但是内核已经不一样,各个库文件已经得到升级和改动,而Hadoop在运行程序时需要master和slave之间的通信 互动,采用的请求机制是一样的,比如master在请求一个动作时用到了libcrypto.so.10这个版本的库文件,那么slave答复或请求 master时也会去自己的系统上找这样的库文件,结果自己的系统上没有,根据Hadoop运行机制就有个timeout和继续重新请求的过程,但是一直 请求不到,所以就一直卡在那个地方,直到最终在限定的时间里没有完成任务导致的”Job failed“。可想而知,如果你master用的是 Ubuntu而slave用的是Fedora,那么这样的错误也非常可能发生。那么为什么之前运行普通的Java版wordcount程序没有发生错误 呢?这我就不清楚了,毕竟不知道其具体的内部运行机理。我猜测可能是Hadoop pipes采用的是socket通信方式,所以需要用到那些库文件,而 普通的Java版程序不需要这样的过程,因此没有发生错误,当然也不排除运行稍复杂的涉及通信的Java程序也发生这样的错误。

因此,我也得出这样一个结论:

Hadoop环境的master节点和所有slave节点的系统环境最好一模一样(同一个系统同一个版本号),对于Hadoop pipes而言就是必须一模一样。

由于我用的都是虚拟机系统,因此需要环境一样的最简单方式就是——克隆(clone)虚拟机,使用链接link方式即可。然后重新配置Hadoop 2.3环境,重新运行,以为问题就此解决,可是新的问题又来了:

// :: INFO mapreduce.Job: Running job: job_1396756477966_0002

// :: INFO mapreduce.Job: Job job_1396756477966_0002 running in uber mode : false
// :: INFO mapreduce.Job: map 0% reduce 0%
14/04/04 09:59:02 INFO mapreduce.Job: Task Id : attempt_1396618478715_0002_m_000000_0, Status : FAILED
Error: java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:186)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:195)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:)
at org.apache.hadoop.mapred.YarnChild$.run(YarnChild.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:)
// :: INFO mapreduce.Job: Task Id : attempt_1396618478715_0002_m_000001_0, Status : FAILED
Error: java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:)
at org.apache.hadoop.mapred.YarnChild$.run(YarnChild.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:)
// :: INFO mapreduce.Job: Task Id : attempt_1396618478715_0002_m_000000_1, Status : FAILED
Error: java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:)
at org.apache.hadoop.mapred.YarnChild$.run(YarnChild.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:)
// :: INFO mapreduce.Job: Task Id : attempt_1396618478715_0002_m_000001_1, Status : FAILED
Error: java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:)
at org.apache.hadoop.mapred.YarnChild$.run(YarnChild.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:)
// :: INFO mapreduce.Job: Task Id : attempt_1396618478715_0002_m_000000_2, Status : FAILED
Error: java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:)
at org.apache.hadoop.mapred.YarnChild$.run(YarnChild.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:)
// :: INFO mapreduce.Job: Task Id : attempt_1396618478715_0002_m_000001_2, Status : FAILED
Error: java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:)
at org.apache.hadoop.mapred.YarnChild$.run(YarnChild.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:)
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Job job_1396618478715_0002 failed with state FAILED due to: Task failed task_1396618478715_0002_m_000000
Job failed as tasks failed. failedMaps: failedReduces:
// :: INFO mapreduce.Job: Counters:
Job Counters
Failed map tasks=
Killed map tasks=
Launched map tasks=
Other local map tasks=
Data-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total vcore-seconds taken by all map tasks=
Total megabyte-seconds taken by all map tasks=
Map-Reduce Framework
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:)

注意这里并没有一直卡在”map 0% reduce 0%“而是很快返回了错误结果,说明之前的问题已经解决,这次肯定不是系统的问题,而是IO问题。同样检查container logs,发现了问题原因:

Server failed to authenticate. Exiting

然后搜索解决方案,最后发现原因竟然是Hadoop 2.3自身的原因,需 要我们自己编译Makefile里所需要的libhadooppipes.a和libhadooputils.a这两个静态库文件以适应自己系统的需求 (官网说预编译的是32位库,如果你是64位的才需要重新编译,可是我主机虽是64位但虚拟机系统是32位的,不知道为什么也不行需要重新编译)。这个要求真是很不注重用户体验的要求,于是就来到了重新编译Hadoop 2.3本地库的地狱世界……

编译Hadoop 2.3 Native Library

首先当然是先下载Hadoop 2.3的源码了,解压后根据里面的BUILDING.txt使用下面命令进行编译获得本地库:

mvn package -Pdist,native -Dskiptests -Dtar

期间出现了各种错误,基本上所有的错误都出现在参考资料9中,解决方案就是安装编译前所需要的各种依赖库和依赖程序,比如protobuf、cmake、zlib-devel和openssl-devel,都安装好了过后就”BUILD SUCCESS“了,真是不易,其实官网也给出了这些提示,说需要提前安装所依赖的东西,只是一开始懒得看,具体看参考资料10

编译成功后,在HADOOP_PATH/hadoop-tools/hadoop-pipes/target/native/路径下可以看到我们所需要的libhadooppipes.a和libhadooputils.a了,将它们拷贝到master和所有slave机器上的 lib/native文件里(最好事先备份一下自带的库文件,避免该方案失败),然后重新make编译wordcount程序,然后 Hadoop pipes运行,最终看到这样的结果:

// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.137:
// :: INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.137:
// :: WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
// :: INFO mapred.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO Configuration.deprecation: hadoop.pipes.java.recordreader is deprecated. Instead, use mapreduce.pipes.isjavarecordreader
// :: INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
// :: INFO Configuration.deprecation: hadoop.pipes.java.recordwriter is deprecated. Instead, use mapreduce.pipes.isjavarecordwriter
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1396756477966_0004
// :: INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
// :: INFO impl.YarnClientImpl: Submitted application application_1396756477966_0004
// :: INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1396756477966_0004/
// :: INFO mapreduce.Job: Running job: job_1396756477966_0004
// :: INFO mapreduce.Job: Job job_1396756477966_0004 running in uber mode : false
// :: INFO mapreduce.Job: map 0% reduce 0%
14/04/06 01:22:56 INFO mapreduce.Job: map 67% reduce 0%
14/04/06 01:22:57 INFO mapreduce.Job: map 100% reduce 0%
14/04/06 01:23:09 INFO mapreduce.Job: map 100% reduce 100%
14/04/06 01:23:11 INFO mapreduce.Job: Job job_1396756477966_0004 completed successfully
// :: INFO mapreduce.Job: Counters:
File System Counters
FILE: Number of bytes read=
FILE: Number of bytes written=
FILE: Number of read operations=
FILE: Number of large read operations=
FILE: Number of write operations=
HDFS: Number of bytes read=
HDFS: Number of bytes written=
HDFS: Number of read operations=
HDFS: Number of large read operations=
HDFS: Number of write operations=
Job Counters
Launched map tasks=
Launched reduce tasks=
Data-local map tasks=
Total time spent by all maps in occupied slots (ms)=
Total time spent by all reduces in occupied slots (ms)=
Total time spent by all map tasks (ms)=
Total time spent by all reduce tasks (ms)=
Total vcore-seconds taken by all map tasks=
Total vcore-seconds taken by all reduce tasks=
Total megabyte-seconds taken by all map tasks=
Total megabyte-seconds taken by all reduce tasks=
Map-Reduce Framework
Map input records=
Map output records=
Map output bytes=
Map output materialized bytes=
Input split bytes=
Combine input records=
Combine output records=
Reduce input groups=
Reduce shuffle bytes=
Reduce input records=
Reduce output records=
Spilled Records=
Shuffled Maps =
Failed Shuffles=
Merged Map outputs=
GC time elapsed (ms)=
CPU time spent (ms)=
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
Shuffle Errors
BAD_ID=
CONNECTION=
IO_ERROR=
WRONG_LENGTH=
WRONG_MAP=
WRONG_REDUCE=
WORDCOUNT
INPUT_WORDS=
OUTPUT_WORDS=
File Input Format Counters
Bytes Read=
File Output Format Counters
Bytes Written=
// :: INFO util.ExitUtil: Exiting with status

心里真是五味杂陈,这个配置环境的过程真是太痛苦了,当然也锻炼了发现问题和解决问题的能力,也懂得了遇到问题后,LOG永远是最佳解决途径。

谨以此文来记录Hadoop pipes运行环境的配置经历,同样也给那些与我遇到类似问题的小伙伴们指明一条明路……

参考资料

1. 如何在Hadoop集群运行JNI程序

2. JNI的替代者—使用JNA访问Java外部函数接口

3.JNI的又一替代者—使用JNR访问Java外部函数接口(jnr-ffi)

4. Hadoop Streaming和Pipes理解

5.一步步教你Hadoop多节点集群安装配置

6. Hadoop 2.3.0 分布式集群搭建图文

7.Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop

8. Hadoop 新 MapReduce 框架 Yarn 详解

9.编译hadoop 2.3的native library

10. Native Libraries Guide

在Hadoop 2.3上运行C++程序各种疑难杂症(Hadoop Pipes选择、错误集锦、Hadoop2.3编译等)的更多相关文章

  1. MapReduce编程入门实例之WordCount:分别在Eclipse和Hadoop集群上运行

    上一篇博文如何在Eclipse下搭建Hadoop开发环境,今天给大家介绍一下如何分别分别在Eclipse和Hadoop集群上运行我们的MapReduce程序! 1. 在Eclipse环境下运行MapR ...

  2. 在OSX和Windows版本Docker上运行GUI程序

    看到很多人在Docker问题区讨论:如何在OS X和Windows的Docker上运行GUI程序, 随手记录几个参考资料: https://github.com/docker/docker/issue ...

  3. 在集群上运行caffe程序时如何避免Out of Memory

    不少同学抱怨,在集群的GPU节点上运行caffe程序时,经常出现"Out of Memory"的情况.实际上,如果我们在提交caffe程序到某个GPU节点的同时,指定该节点某个比较 ...

  4. 安卓手机上运行 PC-E500 程序

    目录 第1章安卓手机上运行 PC-E500 程序    1 1 PockEmul    1 2 下载    1 3 打包BASIC程序    2 4 配置PC-E500模拟器    5 5 载入e50 ...

  5. 如何实现在Windows上运行Linux程序,附示例代码

    微软在去年发布了Bash On Windows, 这项技术允许在Windows上运行Linux程序, 我相信已经有很多文章解释过Bash On Windows的原理, 而今天的这篇文章将会讲解如何自己 ...

  6. Android驱动入门-在Android系统上运行JAVA程序

    在linux上运行java程序,直接用javac编译,再用java运行就行了.但是在Android上,由于虚拟机和pc端的不同,所以操作方法也是不一样的. 如果想在Android上运行Hello wo ...

  7. Hadoop YARN上运行MapReduce程序

    (1)配置集群 (a)配置hadoop-2.7.2/etc/hadoop/yarn-env.sh 配置一下JAVA_HOME export JAVA_HOME=/home/hadoop/bigdata ...

  8. 原生态在Hadoop上运行Java程序

    第一种:原生态运行jar包1,利用eclipse编写Map-Reduce方法,一般引入Hadoop-core-1.1.2.jar.注意这里eclipse里没有安装hadoop的插件,只是引入其匝包,该 ...

  9. Hadoop 系列文章(三) 配置部署启动YARN及在YARN上运行MapReduce程序

    这篇文章里我们将用配置 YARN,在 YARN 上运行 MapReduce. 1.修改 yarn-env.sh 环境变量里的 JAVA_HOME 路径 [bamboo@hadoop-senior ha ...

随机推荐

  1. 25个Linux相关的网站

    下面是25个最具有影响力,也是最重要的Linux网站,这些网站提供了Linux的分发包,软件,文件,新闻,以及其它所有的关于Linux的东西.关于Linux的分发包历史,可以看看本站的这篇文章< ...

  2. maven之web工程的搭建

    参考之前jave application的工程创建的步骤,我们只需要修改最后一步 这样就创建了个web maven工程 与java application应用程序的区别,还有别的区别这里不做多的阐述. ...

  3. 软工网络15团队作业4——Alpha阶段敏捷冲刺(一)

    第 1 篇 Scrum 冲刺: 各个成员在 Alpha 阶段认领的任务 成员      任务 预期任务量/小时 曾艺佳 学习模块:单词及其释义      单词发音     例句学习     添加笔记 ...

  4. java类与继承(转载)

    关于java的类与继承面链接是一个网友总结的,还有列子我个人觉得很详细 固拿来收藏,需要的朋友可从这里访问: http://www.cnblogs.com/dolphin0520/p/3803432. ...

  5. Spring Security Hello World Example--reference

    In this tutorial, we will see how we can use Spring security to protect specific resources. This Hel ...

  6. sql server rdl report 如何用动态sql

    我做rdl report 一般用存储过程,可是今天遇到个问题,需要用动态sql,rdl report数据集不能绑定字段 查了一下谷歌,解决如下: declare @CarrierList table ...

  7. 【VB.NET】利用 ZXing.Net 生成二维码(支持自定义LOGO)

    有任何疑问请去我的新博客提出 https://blog.clso.fun/posts/2019-03-03/vb-net-zxing-net-qr-maker.html ZXing .NET 的项目主 ...

  8. C#系统登录随机验证码生成及其调用方法

    话不多说,直接上代码 public ValidateCode() { } /// <summary> /// 验证码的最大长度 /// </summary> public in ...

  9. UWP 2018 新版 NavigationView 尝鲜

    本文参考了官方文档以及提供的示例代码(官方代码貌似有点误导,所以写了这一篇,并且文末有GayHub代码地址) 官方文档发布于20180806,说明NavigationView刚发布了没几天,还在开发中 ...

  10. python--partial偏函数

    new_func = partial(函数名,参数),  生成一个新的函数, 新的函数中参数是partial固定时的参数 例1: from functools import partial def f ...