Namenode主节点停止报错 Error: flush failed for required journal
主节点间歇性报错其他没有问题 ,SNN的NN没有问题,相关的journalNode也都在,就是主节点的NN会停止。
查看hadoop主节点的NN日志。
2016-11-21 22:36:40,908 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19822 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2016-11-21 22:36:41,088 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.58.183:8485, 192.168.58.181:8485, 192.168.58.182:8485], stream=QuorumOutputStream starting at txid 24533))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2645)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
2016-11-21 22:36:41,089 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 24533
2016-11-21 22:36:41,113 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-11-21 22:36:41,122 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave2/192.168.58.182:8485. Already tried 0 time(s); maxRetries=45
2016-11-21 22:36:41,123 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Slave1/192.168.58.181:8485. Already tried 0 time(s); maxRetries=45
2016-11-21 22:36:41,123 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: StandByNameNode/192.168.58.183:8485. Already tried 0 time(s); maxRetries=45
2016-11-21 22:36:41,137 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 20050ms to send a batch of 1 edits (218 bytes) to remote journal 192.168.58.182:8485
2016-11-21 22:36:41,137 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 20052ms to send a batch of 1 edits (218 bytes) to remote journal 192.168.58.181:8485
2016-11-21 22:36:41,137 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 20065ms to send a batch of 1 edits (218 bytes) to remote journal 192.168.58.183:8485
2016-11-21 22:36:41,145 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CentOSMaster/192.168.58.180
************************************************************/
首先保证设置dfs.namenode.edits.dir和dfs.journalnode.edits.dir,然后设置在hdfs-site.xml中超时时间如下:
<property>
<name>dfs.qjournal.start-segment.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
<value>600000000</value>
</property>
<property>
<name>dfs.qjournal.prepare-recovery.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.accept-recovery.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.finalize-segment.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.select-input-streams.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.get-journal-state.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.new-epoch.timeout.ms</name>
<value>600000000</value>
</property> <property>
<name>dfs.qjournal.write-txns.timeout.ms</name>
<value>600000000</value>
</property>
貌似解决了,至今今天早上没出问题。
Namenode主节点停止报错 Error: flush failed for required journal的更多相关文章
- Jenkins之发布报错“error: RPC failed; curl 18 transfer closed with outstanding read data remaining”
报错信息: error: RPC failed; curl transfer closed with outstanding read data remaining fatal: The remote ...
- pod lib create ObjcName 时候报错error: RPC failed; curl 56 LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 54
众所周知 pod lib create ObjcName 需要从git 上边克隆模版 :https://github.com/CocoaPods/pod-template.git 然后有时候会很慢报错 ...
- 安卓中运行报错Error:Execution failed for task ':app:transformClassesWithDexForDebug'解决
在androidstuio中运行我的未完项目,报错: Error:Execution failed for task ':app:transformClassesWithDexForDebug'.&g ...
- git报错error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500
报错 $ git push; Enumerating objects: 1002, done. Counting objects: 100% (1002/1002), done. Delta comp ...
- 使用spark streaming报错ERROR DFSClient: Failed to close inode xxxx
转载自:http://blog.csdn.net/xiaolixiaoyi/article/details/45875101 好几个Spark streaming的程序同时运行,发现spark报出了如 ...
- Android studio中的一次编译报错’Error:Execution failed for task ':app:transformClassesWithDexForDebug‘,困扰了两天
先说下背景:随着各种第三方框架的使用,studio在编译打包成apk时,在dex如果发现有相同的jar包,不能创建dalvik虚拟机.一个apk,就是一个运行在linux上的一个虚拟机. 上图就是一直 ...
- 使用git克隆github上的项目失败,报错error: RPC failed; curl 56 OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 10054
错误描述 今天在github上使用 git clone 某个项目代码的时, git clone https://github.com/XXXX/xxx-blog.git 下载速度很慢,然后下载一段时间 ...
- git clone报错error: RPC failed; curl 18 transfer closed with outstanding read data remaining
具体错误信息如下图: error: RPC failed; curl 18 transfer closed with outstanding read data remaining fatal: ...
- mac M1通过homebrew安装python3报错Error: Command failed with exit 128: git
fatal: not in a git directoryError: Command failed with exit 128: git 只需要运行 git config --global --ad ...
随机推荐
- asp.net core 日志
日志输出是应用程序必不可少的部分,log4net,nlog这些成熟的组件在之前的项目中被广泛使用,在asp.net core的项目中没有找到与之对应的log4net版本,nlog对core提供了很好的 ...
- JVM监测&工具[转]
通过工具及Java api来监测JVM的运行状态, 需要监测的数据:(内存使用情况 谁使用了内存 GC的状况) 内存使用情况--heap&PermGen @ 表示通过jmap –heap pi ...
- Java集合系列:-----------06List的总结(LinkedList,ArrayList等使用场景和性能分析)
现在,我们再回头看看总结一下List.内容包括:第1部分 List概括第2部分 List使用场景第3部分 LinkedList和ArrayList性能差异分析第4部分 Vector和ArrayList ...
- iOS开发 传感器(加速计、摇一摇、计步器)
一.传感器 1.什么是传感器传感器是一种感应\检测周围环境的一种装置, 目前已经广泛应用于智能手机上 传感器的作用用于感应\检测设备周边的信息不同类型的传感器, 检测的信息也不一样 iPhone中的下 ...
- [资料]自动化e2e测试 -- WebDriverJS,Jasmine和Protractor
1. http://sentsin.com/web/658.html 2. http://www.tuicool.com/articles/AnE3Mb 3. http://www.doc88.com ...
- oracle: job使用
oracle的job,实际上就是数据库内置的定时任务,类似代码中的Timer功能.下面是使用过程: 这里我们模拟一个场景:定时调用存储过程P_TEST_JOB 向表TEST_JOB_LOG中插入数据 ...
- TinyFrame续篇:整合Spring IOC实现依赖注入
上一篇主要讲解了如何搭建基于CodeFirst的ORM,并且在章节末我们获取了上下文对象的实例:BookContext.这节主要承接上一篇,来讲解如何整合Spring IOC容器实现控制反转,依赖注入 ...
- Java7并发编程实战(一) 线程的等待
试想一个情景,有两个线程同时工作,还有主线程,一个线程负责初始化网络,一个线程负责初始化资源,然后需要两个线程都执行完毕后,才能执行主线程 首先创建一个初始化资源的线程 public class Da ...
- Castle 多继承选择
Castle 多继承选择 很多时候,我们定义了一个接口,但是这个接口会有多种不同的,这时IOC构造函数注入的时候,就需要自动选择对应的实现. public interface ITestService ...
- 百度Android定位SDK获取位置
http://gis.sunxianlei.cn/2013/01/27/%E7%99%BE%E5%BA%A6android%E5%AE%9A%E4%BD%8Dsdk%E8%8E%B7%E5%8F%96 ...