ES6.3.2 index操作源码流程
ES 6.3.2 index 操作源码流程

client 发送请求
TransportBulkAction#doExecute(Task,BulkRequest,listener)
- 解析请求,是否要自动创建索引?请求中 是否有mapping信息?
TransportBulkAction#doRun()
获取集群的状态信息
/** sets the last observed state to the currently applied cluster state and returns it */
public ClusterState setAndGetObservedState() {
if (observingContext.get() != null) {
throw new ElasticsearchException("cannot set current cluster state while waiting for a cluster state change");
}
ClusterState clusterState = clusterApplierService.state();
lastObservedState.set(new StoredState(clusterState));
return clusterState;
}
cluster uuid: 5yBoKgbYQ1ibdZ5WG7bRAA
version: 7
state uuid: QVCOkCv_Q_mBGzjwTVDNJw
from_diff: true
meta data version: 5
[test/t-tC0rHESDqNm5SQFO7kPQ]: v[4]
0: p_term [1], isa_ids [UDR6UFa0Sa27ul74kRpyTQ]
1: p_term [1], isa_ids [VeuqdSp8R3ub2_a1a9zHJg]
2: p_term [1], isa_ids [0q3mCMLaSFWgOG5eQJ-EXQ]
3: p_term [1], isa_ids [maBX8A3sRRK8FPG3VzmfKA]
metadata customs:
index-graveyard: IndexGraveyard[[]]
nodes:
{node_sm0}{Xs6SXo4kRj6ylKwLE1dgkA}{bLOl8jv2SGWXt1hk7b_V7g}{127.0.0.1}{127.0.0.1:42641}, master
{node_sd3}{H4rct3ZxRvKJG2dnF0oFtg}{OwRuFVwkTLufu5LBzvFa0w}{127.0.0.1}{127.0.0.1:33747}
{node_sm2}{dUEAma7HQJG4eRFx18dRnA}{WOf3n9RoSSCEOkXa9fgWPQ}{127.0.0.1}{127.0.0.1:36963}
{node_sm1}{kSSol9RjSwyfueUowUdHnQ}{HAgo4XEHS5qWRAokNtzFow}{127.0.0.1}{127.0.0.1:34537}, local
routing_table (version 4):
-- index [[test/t-tC0rHESDqNm5SQFO7kPQ]]
----shard_id [test][0]
--------[test][0], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=UDR6UFa0Sa27ul74kRpyTQ]
----shard_id [test][1]
--------[test][1], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=VeuqdSp8R3ub2_a1a9zHJg]
----shard_id [test][2]
--------[test][2], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=0q3mCMLaSFWgOG5eQJ-EXQ]
----shard_id [test][3]
--------[test][3], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=maBX8A3sRRK8FPG3VzmfKA] routing_nodes:
-----node_id[H4rct3ZxRvKJG2dnF0oFtg][V]
--------[test][3], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=maBX8A3sRRK8FPG3VzmfKA]
--------[test][2], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=0q3mCMLaSFWgOG5eQJ-EXQ]
--------[test][1], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=VeuqdSp8R3ub2_a1a9zHJg]
--------[test][0], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=UDR6UFa0Sa27ul74kRpyTQ]
---- unassigned
customs:
snapshots: SnapshotsInProgress[] snapshot_deletions: SnapshotDeletionsInProgress[] restore: RestoreInProgress[]
解析路由信息
/* resolve the routing if needed */
public void resolveRouting(MetaData metaData) {
routing(metaData.resolveIndexRouting(parent, routing, index));
}
routing_table (version 4):
-- index [[test/t-tC0rHESDqNm5SQFO7kPQ]]
----shard_id [test][0]
--------[test][0], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=UDR6UFa0Sa27ul74kRpyTQ]
----shard_id [test][1]
--------[test][1], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=VeuqdSp8R3ub2_a1a9zHJg]
----shard_id [test][2]
--------[test][2], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=0q3mCMLaSFWgOG5eQJ-EXQ]
----shard_id [test][3]
--------[test][3], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=maBX8A3sRRK8FPG3VzmfKA]
请求中是否有docId?若没有doc id,则自动生成。
// generate id if not already provided
if (id == null) {
assert autoGeneratedTimestamp == -1 : "timestamp has already been generated!";
autoGeneratedTimestamp = Math.max(0, System.currentTimeMillis()); // extra paranoia
String uid;
if (indexCreatedVersion.onOrAfter(Version.V_6_0_0_beta1)) {
uid = UUIDs.base64UUID();
} else {
uid = UUIDs.legacyBase64UUID();
}
id(uid);
}
批量请求分组。计算出请求将要发往哪些shard,路由到相同shard上的请求作为一组。
ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, request.id(), request.routing()).shardId();
List<BulkItemRequest> shardRequests = requestsByShard.computeIfAbsent(shardId, shard -> new ArrayList<>());
shardRequests.add(new BulkItemRequest(i, request));
向各个分片提交请求,回调中检查提交了请求的那些分片 是否 都成功响应了?
shardBulkAction.execute(bulkShardRequest, new ActionListener<BulkShardResponse>() {
@Override
public void onResponse(BulkShardResponse bulkShardResponse) {
for (BulkItemResponse bulkItemResponse : bulkShardResponse.getResponses()) {
// we may have no response if item failed
if (bulkItemResponse.getResponse() != null) {
bulkItemResponse.getResponse().setShardInfo(bulkShardResponse.getShardInfo());
}
responses.set(bulkItemResponse.getItemId(), bulkItemResponse);
}
if (counter.decrementAndGet() == 0) {
finishHim();
}
}
TransportReplicatonAction#messageReceived(ConcreteShardRequest)
接收到请求信息。这里是primary shard所在的节点接收到 index 请求开始时的入口点。Ingest node 在发送 index 请求时,首先根据路由信息和 docid 计算出该请求发往哪个shard,然后从cluster state 中获取 allocationId(allocationId唯一标识了一个shard)。
request: BulkShardRequest [[test][0]] containing [index {[test][type][bogus_doc_ݑݜݢݧݧݯݼa1], source[{}]}], target allocation id: UDR6UFa0Sa27ul74kRpyTQ, primary term: 1
创建异步的primary操作任务AsyncPrimaryAction
@Override
public void messageReceived(ConcreteShardRequest<Request> request, TransportChannel channel, Task task) {
new AsyncPrimaryAction(request.request, request.targetAllocationID, request.primaryTerm, channel, (ReplicationTask) task).run();
}
TransportReplicationAction.AsyncPrimaryAction#doRun
异步primary操作任务执行。在IndexShard上的操作需要获得 permits
protected void doRun() throws Exception {
acquirePrimaryShardReference(request.shardId(), targetAllocationID, primaryTerm, this, request);
}
放一张IndexShard对象的状态,感受一下:

TransportReplicatonAction#acquirePrimaryShardReference()
获取IndexShard对象,IndexShard是个很重要的类,里面封装了很多shard操作
创建获得 permits 的监听器,成功获得 permits 后回调 onResponse()
ActionListener<Releasable> onAcquired = new ActionListener<Releasable>() {
@Override
public void onResponse(Releasable releasable) {
//--->TransportReplicationAction.AsyncPrimaryAction.onResponse
onReferenceAcquired.onResponse(new PrimaryShardReference(indexShard, releasable));
}
开始获取 permit
indexShard.acquirePrimaryOperationPermit(onAcquired, executor, debugInfo);
IndexShardOperationPermits#acquire(listener,executor...)
获取 permit
synchronized (this) {
if (delayed) {
final Supplier<StoredContext> contextSupplier = threadPool.getThreadContext().newRestorableContext(false);
final ActionListener<Releasable> wrappedListener;
if (executorOnDelay != null) {
wrappedListener =
new PermitAwareThreadedActionListener(threadPool, executorOnDelay,
new ContextPreservingActionListener<>(contextSupplier, onAcquired), forceExecution);
} else {
wrappedListener = new ContextPreservingActionListener<>(contextSupplier, onAcquired);
}
delayedOperations.add(new DelayedOperation(wrappedListener, debugInfo, stackTrace));
return;
} else {
releasable = acquire(debugInfo, stackTrace);
}
}
获取成功后,回调
onAcquired.onResponse(releasable);
TransportReplicationAction.AsyncPrimaryAction#onResponse
判断是否 relocated,如果relocated则重新转发请求,否则创建操作对象
if (primaryShardReference.isRelocated()) {
final ShardRouting primary = primaryShardReference.routingEntry();
DiscoveryNode relocatingNode = clusterService.state().nodes().get(primary.relocatingNodeId());
transportService.sendRequest(relocatingNode, transportPrimaryAction,....);
}else{
setPhase(replicationTask, "primary");
final ActionListener<Response> listener = createResponseListener(primaryShardReference);
createReplicatedOperation(request,
ActionListener.wrap(result -> result.respond(listener), listener::onFailure),primaryShardReference)
.execute();//开始真正执行副本操作
}
索引请求
BulkShardRequest [[test][1]] containing [index {[test][type][w3KQpWkBhFoYx7tjRcg3], source[{"field":"value_0"}]}]
ReplicationOperation#execute
检查 active shards数量是否符合要求,如果 active shards 数量小于
wait_for_active_shards则拒绝执行。ReplicationGroup 里面有三种类型的分片集合:inSyncAllocationIds(同步副本集合,当前活跃的分片)、trackedAllcationIds、unavailableInSyncShards(stale replica)。/**
* Checks whether we can perform a write based on the required active shard count setting.
* Returns **null* if OK to proceed, or a string describing the reason to stop
*/
protected String checkActiveShardCount() {
final ShardId shardId = primary.routingEntry().shardId();
final ActiveShardCount waitForActiveShards = request.waitForActiveShards();
if (waitForActiveShards == ActiveShardCount.NONE) {
return null; // not waiting for any shards
}
final IndexShardRoutingTable shardRoutingTable = primary.getReplicationGroup().getRoutingTable();
if (waitForActiveShards.enoughShardsActive(shardRoutingTable)) {
return null;
} else {
final String resolvedShards = waitForActiveShards == ActiveShardCount.ALL ? Integer.toString(shardRoutingTable.shards().size())
: waitForActiveShards.toString();
logger.trace("[{}] not enough active copies to meet shard count of [{}] (have {}, needed {}), scheduling a retry. op [{}], " +
"request [{}]", shardId, waitForActiveShards, shardRoutingTable.activeShards().size(),
resolvedShards, opType, request);
return "Not enough active copies to meet shard count of [" + waitForActiveShards + "] (have " +
shardRoutingTable.activeShards().size() + ", needed " + resolvedShards + ").";
}
}
获取主分片信息,在主分片上执行 primary 请求
final ShardRouting primaryRouting = primary.routingEntry();
final ShardId primaryId = primaryRouting.shardId();
totalShards.incrementAndGet();
pendingActions.incrementAndGet(); // increase by 1 until we finish all primary coordination
primaryResult = primary.perform(request);//---> primary 操作的执行
primary shard 更新 local checkpoint
primary.updateLocalCheckpointForShard(primaryRouting.allocationId().getId(), primary.localCheckpoint());
TransportReplicationAction.PrimaryShardReference#perform
主分片上请求执行成功,才会去创建副本分片请求
result.replicaRequest()@Override
public PrimaryResult perform(Request request) throws Exception {
PrimaryResult result = shardOperationOnPrimary(request, indexShard);
assert result.replicaRequest() == null || result.finalFailure == null : "a replica request [" + result.replicaRequest()
+ "] with a primary failure [" + result.finalFailure + "]";
return result;
}
TransportShardBulkAction#shardOperationOnPrimary
indexmetadata 和 translog,批量执行
executeBulkItemRequestfinal IndexMetaData metaData = primary.indexSettings().getIndexMetaData();
Translog.Location location = null;
for (int requestIndex = 0; requestIndex < request.items().length; requestIndex++) {
if (isAborted(request.items()[requestIndex].getPrimaryResponse()) == false) {
location = executeBulkItemRequest(metaData, primary, request, location, requestIndex,
updateHelper, nowInMillisSupplier, mappingUpdater);
}
}
IndexShard#applyIndexOperationOnPrimary
Secequence Number生成
return applyIndexOperation(SequenceNumbers.UNASSIGNED_SEQ_NO, primaryTerm, version, versionType, autoGeneratedTimestamp,
isRetry, Engine.Operation.Origin.PRIMARY, sourceToParse);
IndexShard#applyIndexOperation
验证 primary shard的 primary term 是否是最新的(防止已经过时的primary shard 还在执行操作导致脏数据)
assert opPrimaryTerm <= this.primaryTerm : "op term [ " + opPrimaryTerm + " ] > shard term [" + this.primaryTerm + "]";
assert versionType.validateVersionForWrites(version);
生成底层Index操作:primary term、source源文本……
operation = prepareIndex(docMapper(sourceToParse.type()), indexSettings.getIndexVersionCreated(), sourceToParse, seqNo,
opPrimaryTerm, version, versionType, origin,
autoGeneratedTimeStamp, isRetry);
org.elasticsearch.index.shard.IndexShard#prepareIndex
Lucene Engine执行
return new Engine.Index(uid, doc, seqNo, primaryTerm, version, versionType, origin, startTime, autoGeneratedIdTimestamp, isRetry);
由于debug断点跟踪的时候,线程挂起时间太长,会导致底层transport 关闭。所以文档写入主分片后,同步到副本的过程,是第二个请求的debug,真实情形下是一个请求,但不影响index操作执行的整个流程。
接下来,在primary shard上执行成功后,重新返回到第9步的:
ReplicationOperation#executeReplicationOperation#performOnReplicas(ReplicaRequest,globalCheckpoint,ReplicationGroup)
- if 语句表示:这不是在primary shard上执行,而是在replica上执行
for (final ShardRouting shard : replicationGroup.getReplicationTargets()) {
if (shard.isSameAllocation(primaryRouting) == false) {
performOnReplica(shard, replicaRequest, globalCheckpoint);
}
}
当前的ShardRouting信息
[test][1], node[H4rct3ZxRvKJG2dnF0oFtg], [P], s[STARTED], a[id=VeuqdSp8R3ub2_a1a9zHJg]
当前的副本请求信息
BulkShardRequest [[test][1]] containing [index {[test][type][w3KQpWkBhFoYx7tjRcg3], source[{"field":"value_0"}]}]
ReplicationOperation#performOnReplica(ShardRouting,ReplicaRequest,globalCheckpoint,)
操作在副本上执行成功后,在回调中更新local checkpoint 和 global checkpoint
replicasProxy.performOn(shard, replicaRequest, globalCheckpoint, new ActionListener<ReplicaResponse>() {
@Override
public void onResponse(ReplicaResponse response) {
successfulShards.incrementAndGet();
try {
primary.updateLocalCheckpointForShard(shard.allocationId().getId(), response.localCheckpoint());//执行成功回调更新检查点
primary.updateGlobalCheckpointForShard(shard.allocationId().getId(), response.globalCheckpoint());
} catch (final AlreadyClosedException e) {
// okay, the index was deleted or this shard was never activated after a relocation; fall through and finish normally
} catch (final Exception e) {
// fail the primary but fall through and let the rest of operation processing complete
final String message = String.format(Locale.ROOT, "primary failed updating local checkpoint for replica %s", shard);
primary.failShard(message, e);
}
decPendingAndFinishIfNeeded();
}
TransportReplicationAction.ReplicasProxy#performOn
副本操作是在一个代理类
TransportReplicationAction.ReplicasProxy上执行的创建 ConcreteReplicaRequest 对象,里面有 global checkpoint 这样副本就能更新到最新的全局检查点、有primary term 这样副本就能判断当前的primary shard是否是最新的(副本会拒绝那些已经被 master 节点标记为stale的主分片,比如因为网络故障primary shard未意识到它自己已经过时了)、有allocationId 这样就能找到目标副本shard。
sendReplicaRequest将它转发到各个副本所在的节点上去执行。String nodeId = replica.currentNodeId();
final DiscoveryNode node = clusterService.state().nodes().get(nodeId);
final ConcreteReplicaRequest<ReplicaRequest> replicaRequest =
new ConcreteReplicaRequest<>(request, replica.allocationId().getId(), primaryTerm, globalCheckpoint);
sendReplicaRequest(replicaRequest, node, listener);
TransportReplicationAction#sendReplicaRequest(ConcreteReplicaRequest,DiscoveryNode,ActionListener)
- 将主分片上已经成功执行了的操作转发到各个副本所在的节点上执行。
完。
总结
这篇文章详细记录了ES写操作的流程。是从测试方法org.elasticsearch.indexing.IndexActionIT#testAutoGenerateIdNoDuplicates开始调试的。从github上git clone下来ES的源码,gradle 编译成 IDEA 工程后,有若干测试目录,其中很多测试类可很好的模拟ES集群的功能,从这些测试方法入手,提高阅读源码效率。
由于ES是先将index操作在primary shard执行成功后,再“同步”到各个replica,各个replica 将同步的结果返回给primary shard,然后 primary shard再给Client返回ACK。显然,primary shard执行失败了,那这个 index 操作肯定执行失败了,返回给Client的ACK那应该是失败的。如果 index 操作在 primary shard 上执行成功了,在primary shard将 index 操作同步给各个replica时,在有些replica上执行失败,那么 primary shard 最终返回给Client的ACK 是成功的。在默认情况下,只要 primary shard 是活跃的,即只要 primary shard 成功执行了 index 操作,就算该 index 操作同步到所有的replica上都失败了时,也会给Client返回一个成功的确认。只不过,在返回的响应中,有一个_shards 的参数,其中的 total 标识了一共需要在多少个分片上执行、successful 标识了执行成功的分片有多少个,这样Client也能知道 一共有多少个分片(primary和相应的replica)成功执行了 index 操作。
为了保证数据的高可靠性,ES中有个配置参数 wait_for_active_shards,默认为1,也即前面提到的只要 primary shard 是活跃的,就可以执行 index 操作。这个参数在上面的第9步操作流程中起作用。在第9步ReplicationOperation#execute方法执行时,首先检查当前的 ReplicationGroup 中的活跃分片是否大于等于wait_for_active_shards,只有大于等于才会继续执行后续的 index 操作。如果将 wait_for_active_shards 设置为2,那么当整个ES集群中只有 primary shard 可用时,index 操作是不能执行的,Client最终会收到一个 Client request timeout 的响应,因为还需要一个活跃的replica才满足 index 操作要求,这样就避免了 只有 primary shard 一个分片接收数据的情况(试想,如果primary shard 所在的节点宕机了会怎么样?)
原文:https://www.cnblogs.com/hapjin/p/10577427.html
ES6.3.2 index操作源码流程的更多相关文章
- ElasticSearch Index操作源码分析
ElasticSearch Index操作源码分析 本文记录ElasticSearch创建索引执行源码流程.从执行流程角度看一下创建索引会涉及到哪些服务(比如AllocationService.Mas ...
- Flask 源码流程,上下文管理
源码流程 创建对象 from flask import Flask """ 1 实例化对象 app """ app = Flask(__na ...
- Flask源码流程分析(一)
Flask源码流程分析: 1.项目启动: 1.实例化Flask对象 1. 重要的加载项: * url_rule_class = Rule * url_map_class = Map * session ...
- Java文件操作源码大全
Java文件操作源码大全 1.创建文件夹 52.创建文件 53.删除文件 54.删除文件夹 65.删除一个文件下夹所有的文件夹 76.清空文件夹 87.读取文件 88.写入文件 99.写入随机文件 9 ...
- django-admin的源码流程
一.admin的源码流程 首先可以确定的是:路由关系一定对应一个视图函数 a.当点击运行的时候,会先找到每一个app中的admin.py文件,并执行 b.执行urls.py admin.site是什么 ...
- Django session 源码流程
流程 Django session源码流程 首先执行的是SessionMiddleware的init方法 import_module(settings.SESSION_ENGINE) 导入了一个 dj ...
- Django Rest Framework框架源码流程
在详细说django-rest-framework源码流程之前,先要知道什么是RESTFUL.REST API . RESTFUL是所有Web应用都应该遵守的架构设计指导原则. REST是Repres ...
- Scala 深入浅出实战经典 第41讲:List继承体系实现内幕和方法操作源码揭秘
Scala 深入浅出实战经典 第41讲:List继承体系实现内幕和方法操作源码揭秘 package com.parllay.scala.dataset /** * Created by richard ...
- Eureka服务端源码流程梳理
一.简述 spring cloud三步走,一导包,二依赖,三配置为我们简化了太多东西,以至于很多东西知其然不知其所以然,了解底层实现之后对于一些问题我们也可以快速的定位问题所在. spring clo ...
随机推荐
- ORA-279 signalled during: alter database recover logfile
在RMAN的RECOVER还原过程中,RMAN界面正常,但是检查.刷新告警日志,发现告警日志里面有ORA-279,如下所示: alter database recover logfile '/u06/ ...
- [FTP]通过FileZilla在阿里云主机上搭建ftp服务器
前一阵子租了一台服务器主机来玩,正好周末有时间研究了一下怎么搭建ftp server. 准备.首先要下载filezilla client和filezilla server, 下载地址: server: ...
- windowsserver 2019系统安装教程
windowsserver2019和windowsserver2016一样也分两个版本标准版和数据中心版. 1.插入系统光盘 2.选择安装版本一般选择带桌面体验的,要不安装成功后没有桌面. 3.设置分 ...
- python学习——读取染色体长度(五:从命令行输入染色体长度)
# 传递命令行参数 # 导入sys模块 import sys print(sys.argv) 命令行操作 python argv.py 10 20 30 40 50 回车输出 ['argv.py' ...
- Linux新手随手笔记
RPM通过将安装规则与源代码打包到一起,来降低软件的安装难度 yum 通过将大量的常用RPM软件存放在一起,解决软件包之间的依赖关系,进一步降低软件的安装难度 rhel 5\6 init rhel 7 ...
- vue 前端将时间戳格式化
转自西风XF : https://blog.csdn.net/qq_36242361/article/details/79143050 后端传过来的时间数据是时间戳的形式,前端需要进行格式化 1. 新 ...
- 记录一下不能使用let时如何创建局部变量(使用立即执行函数)
记录一下阮老师提及的立即执行函数模拟let(以前根本没想到可以这样做啊!) // IIFE 写法 (function () { var tmp = ...; ... }()); // 块级作用域写法 ...
- 转:互斥锁解决同时上传数据丢失BUG
互斥锁:在一个线程修改变量时加锁,则其他变量阻塞,等待加锁的变量解锁后再执行,避免数据覆盖或者其他的异常情况. 原子操作: 所谓原子操作是指不会被线程调度机制打断的操作:这种操作一旦开始,就一直运行到 ...
- python读写excel文件
'''xlrd和xlwt处理的是xls文件,单个sheet最大行数是65535,如果有更大需要的,建议使用openpyxl函数,最大行数达到1048576.'''import openpyxl ''' ...
- Web项目中的 /
如果 / 出现在路径的前面: web.xml中:http://loclalhost:8080/项目名称/ 在项目的根路径下面 jsp中: http://localhost:8080/ ...