ES6.3.2 副本失败处理

副本的失败处理对理解ES的数据副本模型很有帮助。在ES6.3.2 index操作源码流程的总结中提到:ES的写操作会先写主分片,然后主分片再将操作同步到副本分片。本文给出ES中的源码片断,分析副本执行操作失败时,ES是如何处理的。

副本执行源码:replicasProxy.performOn实现了副本操作,执行正常结束回调onResponse(),异常回调onFailure()

replicasProxy.performOn(shard, replicaRequest, globalCheckpoint, new ActionListener<ReplicaResponse>() {
@Override
public void onResponse(ReplicaResponse response) {
successfulShards.incrementAndGet();
try {
primary.updateLocalCheckpointForShard(shard.allocationId().getId(), response.localCheckpoint());//执行成功回调更新检查点
primary.updateGlobalCheckpointForShard(shard.allocationId().getId(), response.globalCheckpoint());
} catch (final AlreadyClosedException e) {
// okay, the index was deleted or this shard was never activated after a relocation; fall through and finish normally
} catch (final Exception e) {
// fail the primary but fall through and let the rest of operation processing complete
final String message = String.format(Locale.ROOT, "primary failed updating local checkpoint for replica %s", shard);
primary.failShard(message, e);
}
decPendingAndFinishIfNeeded();//不管是正常的onResponse还是异常的onFailure,都会调用这个方法,代表已经完成了一个操作,pendingActions减1
} @Override
public void onFailure(Exception replicaException) {
logger.trace(() -> new ParameterizedMessage(
"[{}] failure while performing [{}] on replica {}, request [{}]",
shard.shardId(), opType, shard, replicaRequest), replicaException);
// Only report "critical" exceptions - TODO: Reach out to the master node to get the latest shard state then report.
if (TransportActions.isShardNotAvailableException(replicaException) == false) {
RestStatus restStatus = ExceptionsHelper.status(replicaException);
shardReplicaFailures.add(new ReplicationResponse.ShardInfo.Failure(
shard.shardId(), shard.currentNodeId(), replicaException, restStatus, false));
}
String message = String.format(Locale.ROOT, "failed to perform %s on replica %s", opType, shard);
//---> failShardIfNeeded 具体执行何种操作要看 replicasProxy的真正实现类:如果是WriteActionReplicasProxy则会报告shard错误
replicasProxy.failShardIfNeeded(shard, message,
replicaException, ReplicationOperation.this::decPendingAndFinishIfNeeded,
ReplicationOperation.this::onPrimaryDemoted, throwable -> decPendingAndFinishIfNeeded());
}
});
}

执行正常结束回调onResponse()

successfulShards.incrementAndGet();,在返回的结果里面,_shards 字段里面就能看到 successful 数值。

更新 local checkpoint 和 global checkpoint:如果检查点更新失败,触发:replica shard engine 关闭。

/**
* Fails the shard and marks the shard store as corrupted if
* <code>e</code> is caused by index corruption
*
* org.elasticsearch.index.shard.IndexShard#failShard
*/
public void failShard(String reason, @Nullable Exception e) {
// fail the engine. This will cause this shard to also be removed from the node's index service.
getEngine().failEngine(reason, e);
}
fail engine due to some error. the engine will also be closed.
The underlying store is marked corrupted iff failure is caused by index corruption

关于检查点,可参考这篇文章:elasticsearch-sequence-ids-6-0

异常结束回调 onFailure()

replicasProxy.failShardIfNeeded(shard, message,
replicaException, ReplicationOperation.this::decPendingAndFinishIfNeeded,
ReplicationOperation.this::onPrimaryDemoted, throwable -> decPendingAndFinishIfNeeded());

failShardIfNeeded 可以做2件事情,具体是如何执行得看failShardIfNeeded的实现类。

  1. onPrimaryDemoted

    通知master primary stale(过时)了。index操作首先在primary shard执行成功了,然后同步给replica,但是replica发现此primary shard 的 primary term 比它知道的该索引的primary term 还小,于是replica就认为此primary shard是一个已经过时了的primary shard,因此就回调onFailure()拒绝执行,并执行onPrimaryDemoted通知master节点。

    private void onPrimaryDemoted(Exception demotionFailure) {
    String primaryFail = String.format(Locale.ROOT,
    "primary shard [%s] was demoted while failing replica shard",
    primary.routingEntry());
    // we are no longer the primary, fail ourselves and start over
    primary.failShard(primaryFail, demotionFailure);
    finishAsFailed(new RetryOnPrimaryException(primary.routingEntry().shardId(), primaryFail, demotionFailure));
    }

  2. decPendingAndFinishIfNeeded

    计数。一个请求会由ReplicationGroup中的 多个分片执行,这些分片是否都已经执行完成了?就由pendingActions计数。不管是执行正常结束onResponse还是异常结束onFailure都会调用这个方法。

    private void decPendingAndFinishIfNeeded() {
    assert pendingActions.get() > 0 : "pending action count goes below 0 for request [" + request + "]";
    if (pendingActions.decrementAndGet() == 0) {//当所有的shard都处理完这个请求,client收到ACK(里面允许一些replica执行失败), 或者是收到一个请求超时的响应
    finish();
    }
    }

    对于发起index操作的Client而言,该 index 操作会由primary shard 执行,也会由若干个replica执行。因此,pendingActions统计到底有多少个分片(既包括主分片也包括副本分片)执行完成(在某些副本分片上执行失败也算执行完成)了。正是由于不管是 onResponse() 还是 onFailure(),都会执行decPendingAndFinishIfNeeded()方法,每执行一次,意味着有一个分片返回了响应,这时if (pendingActions.decrementAndGet() == 0)就减1,直到减为0时,调用finish()方法给Client返回ACK响应。

    private void finish() {
if (finished.compareAndSet(false, true)) {
final ReplicationResponse.ShardInfo.Failure[] failuresArray;
if (shardReplicaFailures.isEmpty()) {
failuresArray = ReplicationResponse.EMPTY;
} else {
failuresArray = new ReplicationResponse.ShardInfo.Failure[shardReplicaFailures.size()];
shardReplicaFailures.toArray(failuresArray);
}
primaryResult.setShardInfo(new ReplicationResponse.ShardInfo(
totalShards.get(),
successfulShards.get(),
failuresArray
)
);
resultListener.onResponse(primaryResult);
}
}

Client要么收到一个执行成功的ACK(默认情况下,只要primary shard执行成功,若存在 replica执行失败,Client也会收到一个执行成功的ACK,只不过 返回的ACK里面 _shards参数下的 failed 不为0而已),如下:

{

"_index": "user",

"_type": "profile",

"_id": "10",

"_version": 1,

"result": "created",

"_shards": {

​ "total": 3,

​ "successful": 1,

​ "failed": 0

},

"_seq_no": 0,

"_primary_term": 1

}

另外,ES6.3.2 index操作源码流程 的总结部分,详细解释了Client收到执行成功的ACK的原因。

要么收到一个超时ACK,如下:(这篇文章提到了如何产生一个超时的ACK)

{

"statusCode": 504,

"error": "Gateway Time-out",

"message": "Client request timeout"

}

failShardIfNeeded方法一共有2个具体实现,看类图:

TransportReplicationAction.ReplicasProxy#failShardIfNeeded (默认实现)

@Override
public void failShardIfNeeded(ShardRouting replica, String message, Exception exception,Runnable onSuccess, Consumer<Exception> onPrimaryDemoted, Consumer<Exception> onIgnoredFailure) {
// This does not need to fail the shard. The idea is that this
// is a non-write operation (something like a refresh or a global
// checkpoint sync) and therefore the replica should still be
// "alive" if it were to fail.
onSuccess.run();
}

TransportResyncReplicationAction.ResyncActionReplicasProxy#failShardIfNeeded(副本resync操作的实现)

/**
* A proxy for primary-replica resync operations which are performed on replicas when a new primary is promoted.
* Replica shards fail to execute resync operations will be failed but won't be marked as stale.
* This avoids marking shards as stale during cluster restart but enforces primary-replica resync mandatory.
*/
class ResyncActionReplicasProxy extends ReplicasProxy {
@Override
public void failShardIfNeeded(ShardRouting replica, String message, Exception exception, Runnable onSuccess,
Consumer<Exception> onPrimaryDemoted, Consumer<Exception> onIgnoredFailure) {
shardStateAction.remoteShardFailed(replica.shardId(), replica.allocationId().getId(), primaryTerm, false, message, exception,
createShardActionListener(onSuccess, onPrimaryDemoted, onIgnoredFailure));
}
}

TransportWriteAction.WriteActionReplicasProxy#failShardIfNeeded(index 写操作的实现)

/**
* A proxy for <b>write</b> operations that need to be performed on the
* replicas, where a failure to execute the operation should fail
* the replica shard and/or mark the replica as stale.
*
* This extends {@code TransportReplicationAction.ReplicasProxy} to do the
* failing and stale-ing.
*/
class WriteActionReplicasProxy extends ReplicasProxy { @Override
public void failShardIfNeeded(ShardRouting replica, String message, Exception exception,Runnable onSuccess, Consumer<Exception> onPrimaryDemoted, Consumer<Exception> onIgnoredFailure) {
if (TransportActions.isShardNotAvailableException(exception) == false) {
logger.warn(new ParameterizedMessage("[{}] {}", replica.shardId(), message), exception);}
shardStateAction.remoteShardFailed(replica.shardId(), replica.allocationId().getId(), primaryTerm, true, message, exception,
createShardActionListener(onSuccess, onPrimaryDemoted, onIgnoredFailure));
}

总结

从上面代码中可看出:副本resync操作、副本上 index 写操作失败都会导致 调用 onPrimaryDemoted() 方法,通知master节点判断当前primary shard 是否已经过时(stale)。这可以说是:replica 检验 primary shard是否stale的方式。

另外,primary shard 和 各个replica之间也会通过 租约机制 进行故障检测,以判断对方是否stale,不过这不是本文要讨论的内容了。

参考文章:

原文:https://www.cnblogs.com/hapjin/p/10585555.html

ES6.3.2 副本失败处理的更多相关文章

  1. 分布式入门之2:Quorum机制

    1.  全写读1(write all, read one) 全写读1是最直观的副本控制规则.写时,只有全部副本写成功,才算是写成功.这样,读取时只需要从其中一个副本上读数据,就能保证正确性. 这种规则 ...

  2. HDFS原理讲解

    简介 本文是笔者在学习HDFS的时候的学习笔记整理, 将HDFS的核心功能的原理都整理在这里了. [广告] 如果你喜欢本博客,请点此查看本博客所有文章:http://www.cnblogs.com/x ...

  3. SQL Server Alwayson概念总结

    一.alwayson概念 “可用性组” 针对一组离散的用户数据库(称为“可用性数据库” ,它们共同实现故障转移)支持故障转移环境. 一个可用性组支持一组主数据库以及一至八组对应的辅助数据库(包括一个主 ...

  4. elasticsearch系列七:ES Java客户端-Elasticsearch Java client(ES Client 简介、Java REST Client、Java Client、Spring Data Elasticsearch)

    一.ES Client 简介 1. ES是一个服务,采用C/S结构 2. 回顾 ES的架构 3. ES支持的客户端连接方式 3.1 REST API ,端口 9200 这种连接方式对应于架构图中的RE ...

  5. GFS浅析

    1 . 简介 GFS, Big Table, Map Reduce称为Google的三驾马车,是许多基础服务的基石 GFS于2003年提出,是一个分布式的文件系统,与此前的很多分布式系统的前提假设存在 ...

  6. ES系列十五、ES常用Java Client API

    一.简介 1.先看ES的架构图 二.ES支持的客户端连接方式 1.REST API http请求,例如,浏览器请求get方法:利用Postman等工具发起REST请求:java 发起httpClien ...

  7. 面试小结之Elasticsearch篇(转)

    最近面试一些公司,被问到的关于Elasticsearch和搜索引擎相关的问题,以及自己总结的回答. Elasticsearch是如何实现Master选举的? Elasticsearch的选主是ZenD ...

  8. Data Partitioning Guidance

    在很多大规模的解决方案中,数据都是分成单独的分区,可以分别进行管理和访问的.而分割数据的策略必须仔细的斟酌才能够最大限度的提高效益,同时最大限度的减少不利影响.数据的分区可以极大的提升可扩展性,降低争 ...

  9. Elasticsearch基础知识要点QA

    前言:本文为学习整理实践他人成果的记录型博客.在此统一感谢各原作者,如果你对基础知识不甚了解,可以通过查看Elasticsearch权威指南中文版, 此处注意你的elasticsearch版本,版本不 ...

随机推荐

  1. MySQL数据库在IO性能优化方面的设置选择(硬件)

    提起MySQL数据库在硬件方面的优化无非是CPU.内存和IO.下面我们着重梳理一下关于磁盘I/O方面的优化. 1.磁盘冗余阵列RAID RAID(Redundant Array of Inexpens ...

  2. Python开发者现实版养成路线:从一无所知到无所不知

    初级开发者学Python容易陷入茫然,面对市面上种类众多的编程语言和框架,重要的是坚持自己的选择,宜精不宜杂.本文是一篇指路文,概述了从编程基础.引导.文档阅读.书籍和视频.源代码等学习和积累环节,值 ...

  3. 搭建 structs2 环境

    前言 环境: window 10 ,JDK 1.8 ,Tomcat 7 ,MyEclipse 2014 pro 搭建 SSH 环境的步骤 创建 JavaWeb 项目 导入 structs2 的jar包 ...

  4. ASP.NET学习笔记 —— 一般处理程序之图片上传

    简单图片上传功能目标:实现从本地磁盘读取图片文件,展示到浏览器页面.步骤:(1). 首先创建一个用于上传图片的HTML模板,命名为ImageUpload.html: <!DOCTYPE html ...

  5. git window安装与注册邮箱用户名

    1.git window版本下载 https://git-scm.com/downlods 下载完后点击安装包安装,一直下一步就行; 2.验证安装是否成功 在开始菜单里找到“Git”->“Git ...

  6. python中os模块和sys模块的常见用法

    OS模块的常见用法 os.remove()   删除文件 os.rename()   重命名文件 os.walk()    生成目录树下的所有文件名 os.chdir()    改变目录 os.mkd ...

  7. tmux resurrect 配置

    概述 tmux 用了很长时间了, 快捷键定制了不少, 唯一的遗憾是没法保存 session, 每次关机重开之后, 恢复不到之前的 tmux session. 虽然也能忍受, 但是每天都手动打开之前的 ...

  8. Linux:Day18(上) dns服务基础进阶

    DNS:Domain Name Service,协议(C/S,53/udp,53/tcp):应用层协议. BIND:Bekerley Internat Name Domain,ISC(www.isc. ...

  9. 三十六、fetch

    https://developer.mozilla.org/zh-CN/docs/Web/API/Fetch_API

  10. EXCEL记录

    ー.重要快捷键 Ctrl + F → 查找 Ctrl + H → 替换 Ctrl + G → 定位 Ctrl + 1 → 设置单元格格式 Ctrl + Enter → 一并输入多个单元格 Ctrl + ...