ElasticSearch6.3.2 集群做节点冷(warm) 热(hot) 分离

拿一个小规模的5节点ES集群做冷热分离尝试，它上面已经有60多个索引，有些索引按月、每月生成一个索引，随着数据的不断写入，历史数据(只需保留三个月数据，三个月之前的数据视为历史数据)越来越占磁盘空间和内存资源，影响搜索响应时间。因此想把集群中节点分成2种类型，一种是hot节点，配置大内存和SSD，用来扛平常的用户请求；一种是warm节点，机械硬盘小内存，用来存储历史不常用的数据，和偶尔的后台任务查询。

把现有的5台节点全做hot节点，另外新增2台节点做warm节点。参考官方bloghot-warm-architecture-in-elasticsearch-5-x架构实现。需要注意的地方主要是：不要让已有的索引分片被ES自动Rebalance到warm节点上去了，并且新创建的索引，只应该分配在hot节点上。下面来看具体的实现步骤：

第一步：禁用 rebalance

主要是为了防止集群中已有的索引 rebalance 到新添加的2台warm节点上去，我们只想手动把那些历史索引迁移到warm节点上。

PUT _cluster/settings

{

  "transient": {

    "cluster.routing.allocation.cluster_concurrent_rebalance":0

  }

}

第二步：给节点加标识：node.attr.box_type

关于 node.attr.box_type 属性介绍，可参考：enabling-awareness

修改hot节点的elasticsearch.yml配置文件，添加一行：

node.attr.box_type: hot

修改warm节点的elasticsearch.yml配置文件，添加一行：

node.attr.box_type: warm

第三步：定义通用的索引模板保证新创建索引的分片不会分配到warm节点上

当每月生成一个索引时，新建的索引，肯定是热索引，热索引的分片需要分配到hot节点上，不能分配到warm节点上。比如，loginmac-201908是新建的索引，其分配应该在hot节点上，假设只保留三个月的数据，那么 loginmac-201905就属于历史数据了，需要迁移到warm节点上去。

PUT /_template/hot_template

{

  "template": "*",

  "order": 0,

  "version": 0,

  "settings": {

    "index": {

      "routing": {

        "allocation": {

          "require": {

            "box_type": "hot"

          },

          "exclude":{

            "box_type": "warm"

          }

        }

      },

      "number_of_shards": 3,

      "number_of_replicas": 1,

      "refresh_interval": "50s"

    },

    "index.unassigned.node_left.delayed_timeout": "3d"

  }

}

关于index.routing.allocation.require和index.routing.allocation.exclude可参考：shard-allocation-filtering

第四步把系统上已有索引的配置全部修改成hot配置

PUT _all/_settings

{

  "index": {

    "routing": {

      "allocation": {

        "require": {

          "box_type": "hot"

        }

      }

    }

  }

}

这是为了，保证当warm节点加入集群时，不要让热索引迁移到到warm节点上。

第五步重启所有的修改了elasticsearch.yml 配置为 hot 的节点。等待所有的索引初始化完毕

第六步启动将 elasticsearch.yml 配置为 warm 的节点，并把历史索引数据配置信息修改成 warm

比如将loginmac-201905索引的配置改成 box_type 属性改成 warm（将原来的 hot 属性置为null）。（box_type就是用来标识节点属性的）

PUT  loginmac-201905/_settings

{

  "index": {

    "routing": {

      "allocation": {

        "require": {

          "box_type": "warm"

        },

        "exclude":{

          "box_type": null

        }

      }

    }

  }

}

第七步执行reroute命令，将 box_type为warm的索引迁移到 warm节点上。

loginmac-201905 索引box_type设置成warm后，应该会自动 relocating 到 node.attr.box_type为 warm 的标点上。如果没有自动 relocating，那么执行下面的 reroute 命令即可。另外，为了提高执行 reroute 的效率，可以暂时将 refresh_interval 设置成 -1

其中，node-248是hot节点，node-12是warm节点。

POST /_cluster/reroute

{

  "commands": [

    {

      "move": {

        "index": "loginmac-201905",

        "shard": 2,

        "from_node": "node-248",

        "to_node": "node-12"

      }

    }

  ]

}

最后，来一张集群冷热节点的示意图：

调整后碰到的一些问题：

在修改 node-02 节点的ES 配置文件时：node.attr.box_type: hot重启后节点并未生效，导致这台节点上的分片全部被迁移到其他节点上去了。因此，修改了配置参数后，可用下面命令先检查一下配置是否生效：

GET /_nodes/node-02

再查看节点信息，可看到节点带有 box_type 为 hot 的属性了。

      "attributes": {

        "box_type": "hot",

        "xpack.installed": "true"

      }

所以，在修改了elasticsearch.yml配置文件并重启节点后，最好先GET /_nodes/node-02看一下配置是否生效，否则可能造成大量分片reroute，浪费资源。

另外：还碰到一个重启node-02节点时总是失败的问题：Ubuntu16.04 使用命令：sudo -u user_00 ./bin/elasticsearch -d 一直提示memory not lock 错误（已按官网修改了文件描述符、内存限制等参数）。后来发现使用此种方式 user_00没有足够权限，先用 su user_00 切换到user_00用户，然后再执行 ./bin/elasticsearch -d 启动ES进程。

做完冷热分离后，还可以再做一些其他的优化：

段合并

查看索引loginmac-201905各个段的情况，并force merge

GET /_cat/segments/loginmac-201905?v&h=shard,segment,size,size.memory

POST /loginmac-201905/_forcemerge?max_num_segments=10&flush=true
关闭索引

POST /loginmac-201905/_close

原文：https://www.cnblogs.com/hapjin/p/11314492.html