ES bulk源码分析—

对bulk request的处理流程：

1、遍历所有的request，对其做一些加工，主要包括：获取routing(如果mapping里有的话)、指定的timestamp(如果没有带timestamp会使用当前时间)，如果没有指定id字段，在action.bulk.action.allow_id_generation配置为true的情况下，会自动生成一个base64UUID作为id字段，并会将request的opType字段置为CREATE，因为如果是使用es自动生成的id的话，默认就是createdocument而不是updatedocument。（注：坑爹啊，我从github上面下的最新的ES代码，发现自动生成id这一段已经没有设置opType字段了，看起来和有指定id是一样的处理逻辑了，见https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/index/IndexRequest.java）。

2、创建一个shardId--> Operation的Map，再次遍历所有的request，获取获取每个request应该发送到的shardId，获取的过程是这样的：request有routing就直接返回，如果没有，会先对id求一个hash，这里的hash函数默认是Murmur3，当然你也可以通过配置index.legacy.routing.hash.type来决定使用的hash函数,决定发到哪个shard：

return MathUtils.mod(hash, indexMetaData.getNumberOfShards()); 注意：最新版ES代码已经改变！

即用hash对shard的总数求模来获取shardId，将shardId作为key，通过遍历的index和request组成BulkItemRequest的集合作为value放入之前说的map中（为什么要拿到遍历的index，因为在bulk response中可以看到对每个request的请求处理结果的），其实说了这么多就是要对request按shard来分组（为负载均衡）。

3、遍历上面得到的map，对不同的分组创建一个bulkShardRequest，包含配置consistencyLevel和timeout。并从集群state中获得primary shard，如果primary在本机就直接执行，如果不在会再发送到其shard所在的node。

源码位置：https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

    void executeBulk(Task task, final BulkRequest bulkRequest, final long startTimeNanos, final ActionListener<BulkResponse> listener, final AtomicArray<BulkItemResponse> responses ) {

        final ClusterState clusterState = clusterService.state();

        // TODO use timeout to wait here if its blocked...

        clusterState.blocks().globalBlockedRaiseException(ClusterBlockLevel.WRITE);

        final ConcreteIndices concreteIndices = new ConcreteIndices(clusterState, indexNameExpressionResolver);

        MetaData metaData = clusterState.metaData();

        for (int i = 0; i < bulkRequest.requests.size(); i++) {

            DocWriteRequest docWriteRequest = bulkRequest.requests.get(i);

            //the request can only be null because we set it to null in the previous step, so it gets ignored

            if (docWriteRequest == null) {

                continue;

            }

            if (addFailureIfIndexIsUnavailable(docWriteRequest, bulkRequest, responses, i, concreteIndices, metaData)) {

                continue;

            }

            Index concreteIndex = concreteIndices.resolveIfAbsent(docWriteRequest);

            try {

                switch (docWriteRequest.opType()) {

                    case CREATE:

                    case INDEX:

                        IndexRequest indexRequest = (IndexRequest) docWriteRequest;

                        MappingMetaData mappingMd = null;

                        final IndexMetaData indexMetaData = metaData.index(concreteIndex);

                        if (indexMetaData != null) {

                            mappingMd = indexMetaData.mappingOrDefault(indexRequest.type());

                        }

                        indexRequest.resolveRouting(metaData);

                        indexRequest.process(mappingMd, allowIdGeneration, concreteIndex.getName());

                        break;

                    case UPDATE:

                        TransportUpdateAction.resolveAndValidateRouting(metaData, concreteIndex.getName(), (UpdateRequest) docWriteRequest);

                        break;

                    case DELETE:

                        TransportDeleteAction.resolveAndValidateRouting(metaData, concreteIndex.getName(), (DeleteRequest) docWriteRequest);

                        break;

                    default: throw new AssertionError("request type not supported: [" + docWriteRequest.opType() + "]");

                }

            } catch (ElasticsearchParseException | RoutingMissingException e) {

                BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex.getName(), docWriteRequest.type(), docWriteRequest.id(), e);

                BulkItemResponse bulkItemResponse = new BulkItemResponse(i, docWriteRequest.opType(), failure);

                responses.set(i, bulkItemResponse);

                // make sure the request gets never processed again

                bulkRequest.requests.set(i, null);

            }

        }

        // first, go over all the requests and create a ShardId -> Operations mapping

        Map<ShardId, List<BulkItemRequest>> requestsByShard = new HashMap<>();

        for (int i = 0; i < bulkRequest.requests.size(); i++) {

            DocWriteRequest request = bulkRequest.requests.get(i);

            if (request == null) {

                continue;

            }

            String concreteIndex = concreteIndices.getConcreteIndex(request.index()).getName();

            ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, request.id(), request.routing()).shardId();

            List<BulkItemRequest> shardRequests = requestsByShard.computeIfAbsent(shardId, shard -> new ArrayList<>());

            shardRequests.add(new BulkItemRequest(i, request));

        }

        if (requestsByShard.isEmpty()) {

            listener.onResponse(new BulkResponse(responses.toArray(new BulkItemResponse[responses.length()]), buildTookInMillis(startTimeNanos)));

            return;

        }

        final AtomicInteger counter = new AtomicInteger(requestsByShard.size());

        String nodeId = clusterService.localNode().getId();

        for (Map.Entry<ShardId, List<BulkItemRequest>> entry : requestsByShard.entrySet()) {

            final ShardId shardId = entry.getKey();

            final List<BulkItemRequest> requests = entry.getValue();

            BulkShardRequest bulkShardRequest = new BulkShardRequest(shardId, bulkRequest.getRefreshPolicy(),

                    requests.toArray(new BulkItemRequest[requests.size()]));

            bulkShardRequest.waitForActiveShards(bulkRequest.waitForActiveShards());

            bulkShardRequest.timeout(bulkRequest.timeout());

            if (task != null) {

                bulkShardRequest.setParentTask(nodeId, task.getId());

            }

            shardBulkAction.execute(bulkShardRequest, new ActionListener<BulkShardResponse>() {

                @Override

                public void onResponse(BulkShardResponse bulkShardResponse) {

                    for (BulkItemResponse bulkItemResponse : bulkShardResponse.getResponses()) {

                        // we may have no response if item failed

                        if (bulkItemResponse.getResponse() != null) {

                            bulkItemResponse.getResponse().setShardInfo(bulkShardResponse.getShardInfo());

                        }

                        responses.set(bulkItemResponse.getItemId(), bulkItemResponse);

                    }

                    if (counter.decrementAndGet() == 0) {

                        finishHim();

                    }

                }

            });

        }

    }

路由代码：

ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, request.id(), request.routing()).shardId();

ES bulk源码分析——ES 5.0的更多相关文章

模块化系列教程 | 深入源码分析阿里JarsLink1.0模块化框架
1. 概述 1.1 模块动态加载卸载主流程 2. 模块动态加载 2.1 模块加载源码分析 2.1.1 AbstractModuleRefreshScheduler 2.1.2 ModuleLoader ...
Android 框架学习2：源码分析 EventBus 3.0 如何实现事件总线
Go beyond yourself rather than beyond others. 上篇文章深入理解 EventBus 3.0 之使用篇我们了解了 EventBus 的特性以及如何使用,这 ...
Spring之WebContext不使用web.xml启动初始化重要的类源码分析（Servlet3.0以上的）
入口: org.springframework.web.SpringServletContainerInitializer implements ServletContainerInitializer ...
源码分析 SpringCloud 2020.0.4 版本 EurekaClient 的注册过程
1. 概述老话说的好:要善于思考,有创新意识. 言归正传,之前聊了 Springboot 的启动过程,今天来聊聊 Eureka Client 的注册过程. 2. Eureka Client 的注册过 ...
jQuery源码分析系列
声明:本文为原创文章,如需转载,请注明来源并保留原文链接Aaron,谢谢! 版本截止到2013.8.24 jQuery官方发布最新的的2.0.3为准附上每一章的源码注释分析 :https://git ...
[转]jQuery源码分析系列
文章转自:jQuery源码分析系列-Aaron 版本截止到2013.8.24 jQuery官方发布最新的的2.0.3为准附上每一章的源码注释分析 :https://github.com/JsAaro ...
分布式缓存技术之Redis_Redis集群连接及底层源码分析
目录 1. Jedis 单点连接 2. Jedis 基于sentinel连接基本使用源码分析本次源码分析基于: jedis-3.0.1 1. Jedis 单点连接当是单点服务时,Java ...
【转载】AsyncTask源码分析
原文地址:https://github.com/white37/AndroidSdkSourceAnalysis/blob/master/article/AsyncTask%E5%92%8CAsync ...
jQuery源码分析系列(转载来源Aaron.)
声明:非本文原创文章,转载来源原文链接Aaron. 版本截止到2013.8.24 jQuery官方发布最新的的2.0.3为准附上每一章的源码注释分析 :https://github.com/JsAa ...

随机推荐

win10 nodejs指定ionic版本安装(npm方式)
步骤1 node-v6.11.3-x64.msi 下载安装node-v6.11.3-x64.msi, 安装完成后利用cmd通过npm安装 ionic cordova cmd npm install - ...
assert函数用法总结【转】
本文转载自:http://blog.csdn.net/u014082714/article/details/45190505 assert宏的原型定义在<assert.h>中,其作用是如果 ...
C#学习笔记（六）：循环嵌套、复杂数据类型和枚举
复杂数据类型默认情况:0,1,2,3 赋值情况:0,3,4,5://修改初始值,后面都会改变定义在class外面,作用域更大定义在class里面(类种类),只能在类里使用枚举作用:方便把不同角 ...
js精度问题
JavaScript数字精度丢失问题总结现象原因计算机的二进制实现和位数限制有些数无法有限表示.就像一些无理数不能有限表示,如圆周率 3.1415926...,1.3333... 等.JS 遵 ...
【Android实验】数据存储与访问sqlite
目录实验目的实验要求实验过程功能分析: 实验结果: 实验的代码实验总结实验目的分别使用sqlite3工具和Android代码的方式建立SQLite数据库.在完成建立数据库的工作后 ...
Java中的垃圾回收机制
1. 垃圾回收的意义在C++中,对象所占的内存在程序结束运行之前一直被占用,在明确释放之前不能分配给其它对象:而在Java中,当没有对象引用指向原先分配给某个对象的内存时,该内存便成为垃圾.JVM的 ...
c++ 匹配A容器中最先出现的b容器中的元素，返回iterator,（find_first_of）
#include <iostream> // std::cout #include <algorithm> // std::find_first_of #include < ...
MongoDB（课时16 分页显示）
3.4.2.11 数据分页显示在MongoDB里面的数据分页显示也是符合于大数据要求的操作函数: skip(n):表示跨过多少数据行 limit(n):取出的数据行的个数限制范例:分页显示(比如显 ...
《剑指offer》第十五题（二进制中1的个数）
// 面试题:二进制中1的个数 // 题目:请实现一个函数,输入一个整数,输出该数二进制表示中1的个数.例如 // 把9表示成二进制是1001,有2位是1.因此如果输入9,该函数输出2. #inclu ...
Redis 安装到linux系统
下载地址 : http://download.redis.io/releases/redis-3.0.3.tar.gz ). tar -zxvf redis-.tar.gz -C /usr/share ...

ES bulk源码分析——ES 5.0

ES bulk源码分析——ES 5.0的更多相关文章

随机推荐

热门专题