Elasticsearch学习之集群常见状况处理（干货）

1. 集群健康状况处理

当集群处于yellow或者red状态的时候，整体处理步骤如下：

（1）首先查看集群状态

localhost:/_cluster/health?pretty

{
　　"cluster_name": "elasticsearch",
　　"status": "yellow",
　　"timed_out": false,
　　"number_of_nodes": 1,
　　"number_of_data_nodes": 1,
　　"active_primary_shards": 278,
　　"active_shards": 278,
　　"relocating_shards": 0,
　　"initializing_shards": 0,
　　"unassigned_shards": 278,
　　"delayed_unassigned_shards": 0,
　　"number_of_pending_tasks": 0,
　　"number_of_in_flight_fetch": 0,
　　"task_max_waiting_in_queue_millis": 0,
　　"active_shards_percent_as_number": 50
}

主要关注其中的unassigned_shards指标，其代表已经在集群状态中存在的分片，但是实际在集群里又找不着。通常未分配分片的来源是未分配的副本。比如，一个有 5 分片和 1 副本的索引，在单节点集群上，就会有 5 个未分配副本分片。如果你的集群是 red 状态，也会长期保有未分配分片（因为缺少主分片）。其他指标解释：

(1) initializing_shards 是刚刚创建的分片的个数。比如，当你刚创建第一个索引，分片都会短暂的处于 initializing 状态。这通常会是一个临时事件，分片不应该长期停留在 initializing 状态。你还可能在节点刚重启的时候看到 initializing 分片：当分片从磁盘上加载后，它们会从 initializing 状态开始。

(2) number_of_nodes 和 number_of_data_nodes 这个命名完全是自描述的。

(3) active_primary_shards 指出你集群中的主分片数量。这是涵盖了所有索引的汇总值。

(4) active_shards 是涵盖了所有索引的_所有_分片的汇总值，即包括副本分片。

(5) relocating_shards 显示当前正在从一个节点迁往其他节点的分片的数量。通常来说应该是 0，不过在 Elasticsearch 发现集群不太均衡时，该值会上涨。比如说：添加了一个新节点，或者下线了一个节点。

（2）查找问题索引

curl -XGET 'localhost:9200/_cluster/health?level=indices'

{

    "cluster_name": "elasticsearch",

    "status": "yellow",

    "timed_out": false,

    "number_of_nodes": ,

    "number_of_data_nodes": ,

    "active_primary_shards": ,

    "active_shards": ,

    "relocating_shards": ,

    "initializing_shards": ,

    "unassigned_shards": ,

    "delayed_unassigned_shards": ,

    "number_of_pending_tasks": ,

    "number_of_in_flight_fetch": ,

    "task_max_waiting_in_queue_millis": ,

    "active_shards_percent_as_number": ,

    "indices": {

        "gaczrk": {

            "status": "yellow",

            "number_of_shards": ,

            "number_of_replicas": ,

            "active_primary_shards": ,

            "active_shards": ,

            "relocating_shards": ,

            "initializing_shards": ,

            "unassigned_shards":

        },

        "special-sms-extractor_zhuanche_20200204": {

            "status": "yellow",

            "number_of_shards": ,

            "number_of_replicas": ,

            "active_primary_shards": ,

            "active_shards": ,

            "relocating_shards": ,

            "initializing_shards": ,

            "unassigned_shards":

        },

        "specialhtl201905": {

            "status": "yellow",

            "number_of_shards": ,

            "number_of_replicas": ,

            "active_primary_shards": ,

            "active_shards": ,

            "relocating_shards": ,

            "initializing_shards": ,

            "unassigned_shards":

        },

        "v2": {

         "status": "red", 
         "number_of_shards": 10,
         "number_of_replicas": 1,
         "active_primary_shards": 0,
         "active_shards": 0,
         "relocating_shards": 0,
         "initializing_shards": 0,
         "unassigned_shards": 20 
        },

       "sms20181009": { 
"status": "yellow", 
"number_of_shards": , 
"number_of_replicas": , 
"active_primary_shards": , 
"active_shards": , 
"relocating_shards": , 
"initializing_shards": , 
"unassigned_shards":  
}, 
......

这个参数会让 cluster-health API 在我们的集群信息里添加一个索引清单，以及有关每个索引的细节（状态、分片数、未分配分片数等等），一旦我们询问要索引的输出，哪个索引有问题立马就很清楚了：v2 索引。我们还可以看到这个索引曾经有 10 个主分片和一个副本，而现在这 20 个分片全不见了。可以推测，这 20 个索引就是位于从我们集群里不见了的那两个节点上。一般来讲，Elasticsearch是有自我分配节点功能的，首先查看这个功能是否开启：

curl -XGET 'localhost:9200/_cluster/settings?pretty' -d

'{

    "persistent": {},

    "transient": {

        "cluster": {

            "routing": {

                "allocation": {

                    "enable": "all"

                }

            }

        }

    }

}'

level 参数还可以接受其他更多选项：

localhost:/_cluster/health?level=shards

{

    "cluster_name": "elasticsearch",

    "status": "yellow",

    "timed_out": false,

    "number_of_nodes": ,

    "number_of_data_nodes": ,

    "active_primary_shards": ,

    "active_shards": ,

    "relocating_shards": ,

    "initializing_shards": ,

    "unassigned_shards": ,

    "delayed_unassigned_shards": ,

    "number_of_pending_tasks": ,

    "number_of_in_flight_fetch": ,

    "task_max_waiting_in_queue_millis": ,

    "active_shards_percent_as_number": ,

    "indices": {

        "gaczrk": {

            "status": "yellow",

            "number_of_shards": ,

            "number_of_replicas": ,

            "active_primary_shards": ,

            "active_shards": ,

            "relocating_shards": ,

            "initializing_shards": ,

            "unassigned_shards": ,

            "shards": {

                "": {

                    "status": "yellow",

                    "primary_active": true,

                    "active_shards": ,

                    "relocating_shards": ,

                    "initializing_shards": ,

                    "unassigned_shards":

                },

                "": {

                    "status": "yellow",

                    "primary_active": true,

                    "active_shards": ,

                    "relocating_shards": ,

                    "initializing_shards": ,

                    "unassigned_shards":

                },

                "": {

                    "status": "yellow",

                    "primary_active": true,

                    "active_shards": ,

                    "relocating_shards": ,

                    "initializing_shards": ,

                    "unassigned_shards":

                },

                "": {

                    "status": "yellow",

                    "primary_active": true,

                    "active_shards": ,

                    "relocating_shards": ,

                    "initializing_shards": ,

                    "unassigned_shards":

                },

                "": {

                    "status": "yellow",

                    "primary_active": true,

                    "active_shards": ,

                    "relocating_shards": ,

                    "initializing_shards": ,

                    "unassigned_shards":

                }

            }

        },

......

shards 选项会提供一个详细得多的输出，列出每个索引里每个分片的状态和位置。这个输出有时候很有用，但是由于太过详细会比较难用。

(3) 手动分配未分配分片

查询未分配分片的节点以及未分配原因

localhost:/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason

index                                   shard prirep state      unassigned.reason

gaczrk                                       p      STARTED

gaczrk                                       r      UNASSIGNED CLUSTER_RECOVERED

gaczrk                                       p      STARTED

gaczrk                                       r      UNASSIGNED CLUSTER_RECOVERED

gaczrk                                       p      STARTED

未分配原因说明：

INDEX_CREATED:  由于创建索引的API导致未分配。

CLUSTER_RECOVERED:  由于完全集群恢复导致未分配。

INDEX_REOPENED:  由于打开open或关闭close一个索引导致未分配。

DANGLING_INDEX_IMPORTED:  由于导入dangling索引的结果导致未分配。

NEW_INDEX_RESTORED:  由于恢复到新索引导致未分配。

EXISTING_INDEX_RESTORED:  由于恢复到已关闭的索引导致未分配。

REPLICA_ADDED:  由于显式添加副本分片导致未分配。

ALLOCATION_FAILED:  由于分片分配失败导致未分配。

NODE_LEFT:  由于承载该分片的节点离开集群导致未分配。

REINITIALIZED:  由于当分片从开始移动到初始化时导致未分配（例如，使用影子shadow副本分片）。

REROUTE_CANCELLED:  作为显式取消重新路由命令的结果取消分配。

REALLOCATED_REPLICA:  确定更好的副本位置被标定使用，导致现有的副本分配被取消，出现未分配。

然后执行命令手动分配:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{

    "commands": [{

        "allocate": {

            "index": "gaczrk(索引名称)",

            "shard": 4分片编号),

            "node": "其他node的id",

            "allow_primary": true

        }

    }]

}'

如果未分片较多的话，可以用如下脚本进行自动分派：

#!/bin/bash

array=( node1 node2 node3 )

node_counter=

length=${#array[@]}

IFS=$'\n'

for line in $(curl -s 'http://127.0.0.1:9200/_cat/shards'|  fgrep UNASSIGNED); do

    INDEX=$(echo $line | (awk '{print $1}'))

    SHARD=$(echo $line | (awk '{print $2}'))

    NODE=${array[$node_counter]}

    echo $NODE

    curl -XPOST 'http://127.0.0.1:9200/_cluster/reroute' -d '{

        "commands": [

        {

            "allocate": {

                "index": "'$INDEX'",

                "shard": '$SHARD',

                "node": "'$NODE'",

                "allow_primary": true

            }

        }

        ]

    }'

    node_counter=$(((node_counter)%length +))

done

(4) 快速分配分片

在上面的命令执行输出结果中，假如所有的primary shards都是好的，所有replica shards有问题，有一种快速恢复的方法，就是强制删除掉replica shards，让elasticsearch自主重新生成。首先先将出问题的index的副本为0

curl -XPUT '/问题索引名称/_settings?pretty' -d '{

    "index" : {

        "number_of_replicas" :

    }

}'

然后观察集群状态，最后通过命令在恢复期索引副本数据

curl -XGET '/问题索引名称/_settings

{

    "index" : {

        "number_of_replicas" :

    }

}

等待节点自动分配后，集群成功恢复成gree

（5）集群分片始终处于 INITIALIZING状态

curl -XGET 'localhost:9200/_cat/shards/7a_cool?v&pretty'

7a_cool   r STARTED       .4mb 10.2.4.21 pt01-pte----

7a_cool  r INITIALIZING                 10.2.4.22 pt01-pte----  《==异常分片

解决办法：

1)首先关闭异常分片主机es 服务；

登陆pt01-pte---- 主机  ，/etc/init.d/elasticsearch  stop

如果分片自动迁移至其它主机，状态恢复，则集群正常，如果状态还是在初始化状态，则说明问题依旧存在；则执行上面手动分配分片命令，如果问题依然存在，则将问题索引分片副本数置为0，让集群

自主调整集群分片，调整完成后集群状态变成：green

Elasticsearch学习之集群常见状况处理（干货）的更多相关文章

原创 | 手摸手带您学会 Elasticsearch 单机、集群、插件安装(图文教程)
欢迎关注笔者的公众号: 小哈学Java, 每日推送 Java 领域干货文章,关注即免费无套路附送 100G 海量学习.面试资源哟!! 个人网站: https://www.exception.site/ ...
全文搜索引擎 Elasticsearch 入门：集群搭建
本文主要介绍什么是 ElasticSearch 以及为什么需要它,如何在本机安装部署 ElasticSearch 实例,同时会演示安装 ElasticSearch 插件,以及如何在本地部署多实例集群, ...
hadoop 集群常见错误解决办法
hadoop 集群常见错误解决办法 hadoop 集群常见错误解决办法: (一)启动Hadoop集群时易出现的错误: 1. 错误现象:Java.NET.NoRouteToHostException ...
Docker学习-Kubernetes - 集群部署
Docker学习 Docker学习-VMware Workstation 本地多台虚拟机互通,主机网络互通搭建 Docker学习-Docker搭建Consul集群 Docker学习-简单的私有Dock ...
使用Spring Data ElasticSearch+Jsoup操作集群数据存储
使用Spring Data ElasticSearch+Jsoup操作集群数据存储 1.使用Jsoup爬取京东商城的商品数据 1)获取商品名称.价格以及商品地址,并封装为一个Product对象,代码截 ...
Elasticsearch高级之-集群搭建，数据分片
目录 Elasticsearch高级之-集群搭建,数据分片一广播方式二单播方式三选取主节点四什么是脑裂五错误识别 Elasticsearch高级之-集群搭建,数据分片 es使用两种 ...
1.ElasticSearch系列之集群部署
第一步:安装JDK JDK要求jdk1.8+,不安装也可以,ES自带JDK 第二步:系统配置 2.1 禁用交换区 sudo swapoff -a 2.2 开最大文件数的限制编辑文件 /etc/sec ...
Elasticsearch 6.x版本全文检索学习之集群调优建议
1.系统设置要到位,遵照官方建议设置所有的系统参数. https://www.elastic.co/guide/en/elasticsearch/reference/6.7/setup.html 部署 ...
Elasticsearch 教程--分布式集群
集群补充章节正如前文提到的,这就是第个补充的章节,这里会介绍 Elasticsearch 如何在分布式环境中运行. 本章解释了常用术语,比如集群 (cluster), 节点 (node) 以及 ...

随机推荐

git使用过程中的若干问题笔记
1.关于本地分支创建之后,如何在远程创建同名分支并完成本地分支到远程分支的push 首先创建本地库分支以dev为例然后输入命令git push --set-upstream origin dev / ...
洛谷P4556 雨天的尾巴（线段树合并）
洛谷P4556 雨天的尾巴题目链接题解: 因为一个点可能存放多种物品,直接开二维数组进行统计时间.空间复杂度都不能承受.因为每一个点所拥有的物品只与其子树中的点有关,所以可以考虑对每一个点来建立一 ...
spring boot flyway 配置说明（摘抄）
flyway.baseline-description对执行迁移时基准版本的描述. flyway.baseline-on-migrate当迁移时发现目标schema非空,而且带有没有元数据的表时,是否 ...
PVE授权条款
授权条款 Proxmox VE 软件授权条款,采用的是 GNU AGPL (Affero General Public License) 条款,而 Proxmox VE 本身是 Free Softwa ...
discuz! X3.4特殊字符乱码解决方案
Discuz! X3.4升级后,帖子内容使用Unicode编码会出现直接显示源码问题打开:source\function\function_core.php $string = str_replac ...
nuxt build 项目文件分析、nuxt build 发布后的资源如何部署cdn
建议在项目发布的时候,还是将.nuxt 进行发布到生产环境,是比较稳妥的做法出处:https://nickfu.com/p/150 nuxt build 后的前端资源都会存放在.nuxt/dist/ ...
git在windows及linux（源码编译）环境下安装
git在windows下安装下载地址:https://git-scm.com/ 默认安装即可验证 git --version git在linux下安装下载地址:https://mirrors.e ...
jmeter，badboy，jar包文件常数吞吐量计时器？
badboy录制脚本 1.按f2 红色开始录制 URL输入:https://www.so.com/ 2.搜索框输入zxw 回车键搜索 3.选中关键字(刮例如zxw软件——>tools——> ...
Excel——读取文件后——组装成待插入数据库数据——实体映射模式
package com.it.excel.excelLearn; import java.io.IOException; import java.util.HashMap; import java.u ...
react navite 学习资料
react 学习资料 https://github.com/crazycodeboy/GitHubPopular crazycodeboy/GitHubPopular https://github.c ...

Elasticsearch学习之集群常见状况处理（干货）

Elasticsearch学习之集群常见状况处理（干货）的更多相关文章

随机推荐

热门专题