［ES版本］

5.5.0

［分析过程］

找到Recovery有6种状态

public class RecoveryState implements ToXContent, Streamable {

    public enum Stage {

        //初始化状态

        INIT((byte) 0),

        /**

         * recovery of lucene files, either reusing local ones are copying new ones

         */

        //从lucene或本地文件复制新文件

        INDEX((byte) 1),

        /**

         * potentially running check index

         */

        //检查确认索引

        VERIFY_INDEX((byte) 2),

        /**

         * starting up the engine, replaying the translog

         */

        //启动引擎重放translog

        TRANSLOG((byte) 3),

        /**

         * performing final task after all translog ops have been done

         */

        //translog结束后执行最终任务

        FINALIZE((byte) 4),

        //完成状态

        DONE((byte) 5);

        ...

    }

    ...

}

shard有4种状态

public enum ShardRoutingState {

    /**

     * The shard is not assigned to any node.

     */

    //分片未分配

    UNASSIGNED((byte) 1),

    /**

     * The shard is initializing (probably recovering from either a peer shard

     * or gateway).

     */

    //分片正在初始化（可能正在从peer shard或者gateway进行恢复）

    INITIALIZING((byte) 2),

    /**

     * The shard is started.

     */

    //分片已经启动

    STARTED((byte) 3),

    /**

     * The shard is in the process being relocated.

     */

    //分片正在迁移

    RELOCATING((byte) 4);

    ...

}

找到一处调用位置：PeerRecoverySourceService

/**

 * The source recovery accepts recovery requests from other peer shards and start the recovery process from this

 * source shard to the target shard.

 */

public class PeerRecoverySourceService extends AbstractComponent implements IndexEventListener {

    public static class Actions {

        public static final String START_RECOVERY = "internal:index/shard/recovery/start_recovery";

    }

    private final TransportService transportService;

    private final IndicesService indicesService;

    private final RecoverySettings recoverySettings;

    private final ClusterService clusterService;

    private final OngoingRecoveries ongoingRecoveries = new OngoingRecoveries();

    @Inject

    public PeerRecoverySourceService(Settings settings, TransportService transportService, IndicesService indicesService,

                                     RecoverySettings recoverySettings, ClusterService clusterService) {

        super(settings);

        this.transportService = transportService;

        this.indicesService = indicesService;

        this.clusterService = clusterService;

        this.recoverySettings = recoverySettings;

        transportService.registerRequestHandler(Actions.START_RECOVERY, StartRecoveryRequest::new, ThreadPool.Names.GENERIC, new StartRecoveryTransportRequestHandler());

    }

    //在分片关闭前，要把所有正在recovery的动作中止掉

    @Override

    public void beforeIndexShardClosed(ShardId shardId, @Nullable IndexShard indexShard,

                                       Settings indexSettings) {

        if (indexShard != null) {

            ongoingRecoveries.cancel(indexShard, "shard is closed");

        }

    }

    private RecoveryResponse recover(StartRecoveryRequest request) throws IOException {

        final IndexService indexService = indicesService.indexServiceSafe(request.shardId().getIndex());

        final IndexShard shard = indexService.getShard(request.shardId().id());

        // starting recovery from that our (the source) shard state is marking the shard to be in recovery mode as well, otherwise

        // the index operations will not be routed to it properly

        //先判断目的节点是否正常存在集群中

        RoutingNode node = clusterService.state().getRoutingNodes().node(request.targetNode().getId());

        //如果不在集群，则推迟进行recovery

        if (node == null) {

            logger.debug("delaying recovery of {} as source node {} is unknown", request.shardId(), request.targetNode());

            throw new DelayRecoveryException("source node does not have the node [" + request.targetNode() + "] in its state yet..");

        }

        ShardRouting routingEntry = shard.routingEntry();

        //是主分片并且当前分片非迁移状态

        // 或者

        //是主分片且处于迁移状态，但是目标节点与正在迁移的目标节点不一致

        if (request.isPrimaryRelocation() && (routingEntry.relocating() == false || routingEntry.relocatingNodeId().equals(request.targetNode().getId()) == false)) {

            logger.debug("delaying recovery of {} as source shard is not marked yet as relocating to {}", request.shardId(), request.targetNode());

            throw new DelayRecoveryException("source shard is not marked yet as relocating to [" + request.targetNode() + "]");

        }

        ShardRouting targetShardRouting = node.getByShardId(request.shardId());

        //节点上未获取到目标分片

        if (targetShardRouting == null) {

            logger.debug("delaying recovery of {} as it is not listed as assigned to target node {}", request.shardId(), request.targetNode());

            throw new DelayRecoveryException("source node does not have the shard listed in its state as allocated on the node");

        }

        //目标分片非初始化状态

        if (!targetShardRouting.initializing()) {

            logger.debug("delaying recovery of {} as it is not listed as initializing on the target node {}. known shards state is [{}]",

                request.shardId(), request.targetNode(), targetShardRouting.state());

            throw new DelayRecoveryException("source node has the state of the target shard to be [" + targetShardRouting.state() + "], expecting to be [initializing]");

        }

        //请求中未携带分配的ID，需要重新构造请求

        if (request.targetAllocationId() == null) {

            // ES versions < 5.4.0 do not send targetAllocationId as part of recovery request, just assume that we have the correct id

            request = new StartRecoveryRequest(request.shardId(), targetShardRouting.allocationId().getId(), request.sourceNode(),

                request.targetNode(), request.metadataSnapshot(), request.isPrimaryRelocation(), request.recoveryId());

        }

｀

        //请求中携带的目标分配ID与分片的分配唯一标识ID不一致

        if (request.targetAllocationId().equals(targetShardRouting.allocationId().getId()) == false) {

            logger.debug("delaying recovery of {} due to target allocation id mismatch (expected: [{}], but was: [{}])",

                request.shardId(), request.targetAllocationId(), targetShardRouting.allocationId().getId());

            throw new DelayRecoveryException("source node has the state of the target shard to have allocation id [" +

                targetShardRouting.allocationId().getId() + "], expecting to be [" + request.targetAllocationId() + "]");

        }

        //往正在recovery的列表中增加一个新的recovery

        RecoverySourceHandler handler = ongoingRecoveries.addNewRecovery(request, shard);

        logger.trace("[{}][{}] starting recovery to {}", request.shardId().getIndex().getName(), request.shardId().id(), request.targetNode());

        try {

            return handler.recoverToTarget();

        } finally {

            ongoingRecoveries.remove(shard, handler);

        }

    }

    ...

}

StartRecoveryTransportRequestHandler中的messageReceived负责从transport channel接收启动recovery的请求并执行recover操作。

<未完待续>

ElasticSearch recovery过程源码分析的更多相关文章

转：InnoDB Crash Recovery 流程源码实现分析
此文章转载给登博的文章,给大家分享 InnoDB Crash Recovery 流程源码实现分析 Crash Recovery问题本文主要分析了InnoDB整个crash recovery的源码处理 ...
ElasticSearch Index操作源码分析
ElasticSearch Index操作源码分析本文记录ElasticSearch创建索引执行源码流程.从执行流程角度看一下创建索引会涉及到哪些服务(比如AllocationService.Mas ...
[Android]从Launcher开始启动App流程源码分析
以下内容为原创,欢迎转载,转载请注明来自天天博客:http://www.cnblogs.com/tiantianbyconan/p/5017056.html 从Launcher开始启动App流程源码 ...
[Android]Android系统启动流程源码分析
以下内容为原创,欢迎转载,转载请注明来自天天博客:http://www.cnblogs.com/tiantianbyconan/p/5013863.html Android系统启动流程源码分析首先 ...
Android系统默认Home应用程序（Launcher）的启动过程源码分析
在前面一篇文章中,我们分析了Android系统在启动时安装应用程序的过程,这些应用程序安装好之后,还须要有一个Home应用程序来负责把它们在桌面上展示出来,在Android系统中,这个默认的Home应 ...
Android Content Provider的启动过程源码分析
本文參考Android应用程序组件Content Provider的启动过程源码分析http://blog.csdn.net/luoshengyang/article/details/6963418和 ...
Android应用程序绑定服务（bindService）的过程源码分析
Android应用程序组件Service与Activity一样,既能够在新的进程中启动,也能够在应用程序进程内部启动:前面我们已经分析了在新的进程中启动Service的过程,本文将要介绍在应用程序内部 ...
Spring加载流程源码分析03【refresh】
前面两篇文章分析了super(this)和setConfigLocations(configLocations)的源代码,本文来分析下refresh的源码, Spring加载流程源码分析01[su ...
【高速接口-RapidIO】5、Xilinx RapidIO核例子工程源码分析
提示:本文的所有图片如果不清晰,请在浏览器的新建标签中打开或保存到本地打开一.软件平台与硬件平台软件平台: 操作系统:Windows 8.1 64-bit 开发套件:Vivado2015.4.2 ...

随机推荐

c#后台修改前台DOM的css属性示例代码
<div id = 'div1' runat="server">haha</div> ----------- 后台代码中这样调用 div1.Style[&q ...
Hadoop1.2.1 单机模式安装
首先安装JDK: 然后安装hadoop: 最后的实例测试:首先在 /opt/data 目录下创建 input目录, 然后把hadoop的conf目录下的所有xml文件拷贝到上面的input目录, 然后 ...
poj_3628 动态规划
题目大意有N个数字,大小为a[i], 给定一个数S,用这N个数中的某些数加起来使得结果sum>= S,且sum-S最小,求该最小的sum-S值. 题目分析题意中可知,这N个数字的和肯定大于S ...
Mybatis中oracle如何批量insert语句
<insert id="batchInsertNoticeUser" useGeneratedKeys="false" keyProperty=" ...
用httpclient做压力测试时Too many open files的解决办法
在工作过程中,用httpclient去压测一个web api,发现压一小段时间就出现了Too many open files.实际上,HttpClient建立Socket时 ,post.release ...
160429、nodejs--Socket.IO即时通讯
动态web 在html5以前,web的设计上并没有考虑过动态,他一直是围绕着文档设计的,我们看以前比较老的网站,基本上都是某一刻用来显示单一的文档的,用户请求一次web页面,获取一个页面,但是随着时间 ...
使用 MySQL 管理层次结构的数据
概述我们知道,关系数据库的表更适合扁平的列表,而不是像 XML 那样可以直管的保存具有父子关系的层次结构数据. 首先定义一下我们讨论的层次结构,是这样的一组数据,每个条目只能有一个父条目,可以有零个 ...
HDU_5527_Too Rich
Too Rich Time Limit: 6000/3000 MS (Java/Others) Memory Limit: 262144/262144 K (Java/Others)Total ...
修改Android模拟器的system分区，以及加入SuperSU
http://www.claudxiao.net/2013/10/persistent-change-emulator-system-partition/ 对Android的模拟器,如果要修改其s ...
Java/android 里ClassName.this和this的使用
如果在内部类里面用this就是指这个内部类的实例,而如果用OuterClassName.this就是它外面的那个类的实例 ClassName.this这个用法多用于在nested class(内部类) ...

ElasticSearch recovery过程源码分析

［ES版本］

［分析过程］

ElasticSearch recovery过程源码分析的更多相关文章

随机推荐

热门专题