HA模式强制手动切换:IPC's epoch [X] is less than the last promised epoch [X+1]
-- ::, WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.58.183: failed to write txns -. Will try to write to this JN again after the next log roll.
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$.callBlockingMethod(QJournalProtocolProtos.java:)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:)
at com.sun.proxy.$Proxy10.journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$.call(IPCLoggerChannel.java:)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$.call(IPCLoggerChannel.java:)
at java.util.concurrent.FutureTask.run(FutureTask.java:)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:)
at java.lang.Thread.run(Thread.java:)
-- ::, WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.58.181: failed to write txns -. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 11 is less than the last promised epoch 12
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$.callBlockingMethod(QJournalProtocolProtos.java:)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at org.apache.hadoop.ipc.Server$Handler$.run(Server.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:)
at com.sun.proxy.$Proxy10.journal(Unknown Source)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$.call(IPCLoggerChannel.java:)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$.call(IPCLoggerChannel.java:)
at java.util.concurrent.FutureTask.run(FutureTask.java:)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:)
at java.lang.Thread.run(Thread.java:)
-- ::, WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 192.168.58.182: failed to write txns -. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 11 is less than the last promised epoch 12
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:)
一、错误起因
Active NameNode日志出现异常IPC's epoch [X] is less than the last promised epoch [X+1],出现短期的双Active
我配置的ha自动切换,但是发现STandByNameNode是active,我强制手动切换了三次,STandByNameNode就无法访问了,估计是这个问题。
二.内部原因
【HDFS机制】:该问题属于hdfs对于脑列的异常保护,属于正常行为,不影响业务。
1)ZKFC1对NameNode1(Active)进行健康检查,因为长时间监控不到NN1的回复,认为该NameNode1不健康,主动释放zk中的ActiveStandbyElectorLock,此时NN1还是active(因为zkfc与NameNode1连接异常,不能将其shutdown)。
zkfc log: -- ::, WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at namenode01/172.21.248.14:: Call From namenode01/ 72.21.248.14 to namenode02: failed on socket timeout exception: java.net.SocketTimeoutException: millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[co nnected local=/172.21.248.14: remote=namenode01/172.21.248.14:]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout -- ::, WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at namenode02/172.21.248.13: standby (unable to connect) java.net.SocketTimeoutException: Call From namenode01/172.21.248.14 to namenode02: failed on socket timeout exception: java.net.SocketTimeoutException: millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.21.248.14: remote=namenode02/172.21.248.13:]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
2)ZKFC2在zk中竞争到ActiveStandbyElectorLock,将NameNode2(原来的Standby)变成Active,同时会更新JN中的epoch使其+1。
3)NameNode1(原先的Active)再次去操作JournalNode的editlog时发现自己的epoch比JN的epoch小1,促使自己重启,成为Standby NameNode。
NN1 log: -- ::, FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.1.1.107:, 192.10.1.208:, 192.10.1.209:], stream=QuorumOutputStream starting at txid )) org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size /. exceptions thrown: 192.10.1.208:: IPC's epoch 115 is less than the last promised epoch 116
三.解决方案
可以在core-site.xml文件中修改ha.health-monitor.rpc-timeout.ms参数值,来扩大zkfc监控检查超时时间。
<property> <name>ha.health-monitor.rpc-timeout.ms</name> <value>180000</value> </property>
四、结束语
最后设置成手动切换吧...其实可以通过zookeeper来找到那个是active,我先不这么做吧。在hdfs-site.xml。
但是设置成不自动切换的话,zkfc就取法启动,hbase必须用自己的zookeeper。
HA模式强制手动切换:IPC's epoch [X] is less than the last promised epoch [X+1]的更多相关文章
- IPC's epoch 6 is less than the last promised epoch 7
一.错误起因 Active NameNode日志出现异常IPC‘s epoch [X] is less than the last promised epoch [X+1],出现短期的双Active ...
- Hadoop- Namenode经常挂掉 IPC's epoch 9 is less than the last promised epoch 10
如题出现Namenode经常挂掉 IPC's epoch 9 is less than the last promised epoch 10, 2019-01-03 05:36:14,774 INFO ...
- Hadoop集群搭建-HA高可用(手动切换模式)(四)
步骤和集群规划 1)保存完全分布式模式配置 2)在full配置的基础上修改为高可用HA 3)第一次启动HA 4)常规启动HA 5)运行wordcount 集群规划: centos虚拟机:node-00 ...
- HA模式手动切换namenode状态
查看状态 hdfs haadmin -getServiceState nn1 有时候通过网页访问两个namenode的http-address,看到默认的主namenode状态变成了standy,这时 ...
- 大数据技术之Hadoop3.1.2版本HA模式
大数据技术之Hadoop3.1.2版本HA模式 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Hadoop的HA特点 1>.主备NameNode 2>.解决单点故障 ...
- MHA手动切换 原创1(主故障)
MHA提供了3种方式用于实现故障转移,分别自动故障转移,需要启用MHA监控: 在无监控的情况下的手动故障转移以及基于在线手动切换. 三种方式可以应对MySQL主从故障的任意场景.本文主要描述在无监控的 ...
- 一脸懵逼学习Hadoop分布式集群HA模式部署(七台机器跑集群)
1)集群规划:主机名 IP 安装的软件 运行的进程master 192.168.199.130 jdk.hadoop ...
- Python进阶----异步同步,阻塞非阻塞,线程池(进程池)的异步+回调机制实行并发, 线程队列(Queue, LifoQueue,PriorityQueue), 事件Event,线程的三个状态(就绪,挂起,运行) ,***协程概念,yield模拟并发(有缺陷),Greenlet模块(手动切换),Gevent(协程并发)
Python进阶----异步同步,阻塞非阻塞,线程池(进程池)的异步+回调机制实行并发, 线程队列(Queue, LifoQueue,PriorityQueue), 事件Event,线程的三个状态(就 ...
- 分布式集群HA模式部署
一:HDFS系统架构 (一)利用secondary node备份实现数据可靠性 (二)问题:NameNode的可用性不高,当NameNode节点宕机,则服务终止 二:HA架构---提高NameNode ...
随机推荐
- data:image/png;base64
大家可能注意到了,网页上有些图片的src或css背景图片的url后面跟了一大串字符,比如: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAJ ...
- ASP.NET MVC铵钮Click后下载文件
本篇Insus.NET练习的是FilePathResult和FileStreamResult操作.本篇也算是<如何把Json格式字符写进text文件中>http://www.cnblogs ...
- [转]ReactPHP── PHP版的Node.js
FROM : http://www.csdn.net/article/2015-10-12/2825887 摘要:ReactPHP作为Node.js的PHP版本.在实现思路,使用方法,应用场景上的确有 ...
- C++变量命名规则
转自:http://www.cnblogs.com/finallyliuyu/archive/2010/09/25/1834301.html 浅谈C++变量命名规则 不知道别的公司如何,反正我现在的公 ...
- C语言 malloc()与sizeof运算的盲点
//malloc()与sizeof运算的盲点 #include <stdio.h> #include <stdlib.h> #include <string.h> ...
- node基础06:回调函数
1.Node异步编程 Node.js 异步编程的直接体现就是回调. 异步编程依托于回调来实现,但不能说使用了回调后程序就异步化了. 回调函数在完成任务后就会被调用,Node 使用了大量的回调函数,No ...
- 打印机设置(PrintDialog)、页面设置(PageSetupDialog) 及 RDLC报表如何选择指定打印机
如果一台电脑同时连接多个打印机,而且每个打印机使用的纸张大小各不相同(比如:票据打印钱用的小票专用张,办公打印机用的是A4标准纸),在处理打印类的需求时,如果不用代码干预,用户必须每次打印时,都必须在 ...
- 别再迷信 zepto 了
希望网上公开课的老师们不要再讲移动端网页用zepto了,坑了无数鸟啊 ~~~. 1.自己/公司/项目组所写和所积累(网上下的)的js函数都是以jQuery插件的写法来写的,如果要换到zepto上的话那 ...
- MVC认知路【点点滴滴支离破碎】【一】----新建数据库
1.App_Data文件夹创建[SQL Server Compact Local Database *]数据库 2.添加链接字符串<add name="MovieDBContext&q ...
- Android四大组件之Activity详解——创建和启动Activity
前面我们已经对Activity有过简单的介绍: Android开发——初始Activity Android开发——响应用户事件 Android开发——Activity生命周期 先来看一下最终结果 项目 ...