Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由
Red Cluster!
摘自:http://blog.kiyanpro.com/2016/03/06/elasticsearch/reroute-unassigned-shards/
There are 3 cluster states:
- green: All primary and replica shards are active
- yellow: All primary shards are active, but not all replica shards are active
- red: Not all primary shards are active
When cluster health is red, it means cluster is dead. And that means you can do nothing until it’s recovered, which is very bad indeed. I will share with you how to deal with one common situation: when cluster is red due to unassigned shards.
Steps
The general idea is pretty simple: find those shards which are unassigned, manually assign them to a node with reroute API. Let’s see how we can do that step by step. Then we can combine them into a configurable simple script.
Step 1: Check Unassigned Shards
To get cluster information, we usually use cat APIs. There is a GET /_cat/shards
endpoint to show a detailed view of what nodes contain which shards[1].
Cat shards
1
2
3
4
5
6
7
8
9
|
# cat shards verbose
curl "http://your.elasticsearch.host.com:9200/_cat/shards?v"
# cat shards index
curl "http://your.elasticsearch.host.com:9200/_cat/shards/wiki2"
# example return
# wiki2 0 p STARTED 197 3.2mb 192.168.56.10 Stiletto
# wiki2 1 p STARTED 205 5.9mb 192.168.56.30 Frankie Raye
# wiki2 2 p STARTED 275 7.8mb 192.168.56.20 Commander Kraken
|
By piping cat shards to fgrep, we can get all unassigned shards.
Get unassigned shards
1
2
3
4
5
6
|
# cat shards with fgrep
curl "http://your.elasticsearch.host.com:9200/_cat/shards" | fgrep UNASSIGNED
# example return
# wiki1 0 r UNASSIGNED ALLOCATION_FAILED
# wiki1 1 r UNASSIGNED ALLOCATION_FAILED
# wiki1 2 r UNASSIGNED ALLOCATION_FAILED
|
If you don’t want to deal with shell script, you can also find these unassigned shards using another endpoint POST /_flush/synced
[2]. This endpoint is actually not just some information. It allows an administrator to initiate a synced flush manually. This can be particularly useful for a planned (rolling) cluster restart where you can stop indexing and don’t want to wait the default 5 minutes for idle indices to be sync-flushed automatically. It returns with a json response.
_flush/synced
1
|
curl -XPOST "http://your.elasticsearch.host.com:9200/twitter/_flush/synced"
|
If there are failed shards in the response, we can iterate through a failures array to get all unassigned ones.
Example response with failed shards
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
{
"_shards": {
"total": 4,
"successful": 1,
"failed": 1
},
"twitter": {
"total": 4,
"successful": 3,
"failed": 1,
"failures": [
{
"shard": 1,
"reason": "unexpected error",
"routing": {
"state": "STARTED",
"primary": false,
"node": "SZNr2J_ORxKTLUCydGX4zA",
"relocating_node": null,
"shard": 1,
"index": "twitter"
}
}
]
}
}
|
Step 2: Reroute
The reroute command allows to explicitly execute a cluster reroute allocation command including specific commands[3] . An unassigned shard can be explicitly allocated on a specific node.
Reroute example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"move" :
{
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate" : {
"index" : "test", "shard" : 1, "node" : "node3"
}
}
]
}'
|
There are 3 kinds of commands you can use:
move: Move a started shard from one node to another node. Accepts index and shard for index name and shard number, from_node for the node to move the shard from, and to_node for the node to move the shard to.
cancel: Cancel allocation of a shard (or recovery). Accepts index and shard for index name and shard number, and node for the node to cancel the shard allocation on. It also accepts allow_primary flag to explicitly specify that it is allowed to cancel allocation for a primary shard. This can be used to force resynchronization of existing replicas from the primary shard by cancelling them and allowing them to be reinitialized through the standard reallocation process.
allocate: Allocate an unassigned shard to a node. Accepts the index and shard for index name and shard number, and node to allocate the shard to. It also accepts allow_primary flag to explicitly specify that it is allowed to explicitly allocate a primary shard (might result in data loss).
Combining step 2 with the unassigned shards from Step 1, we can reroute all unassigned shards 1 by 1, thus getting faster cluster recovery from red state.
Example Solutions
Python
Below is a python script I wrote using POST /_flush/synced
and POST /reroute
Shell Script
Below is a shell script I found elsewhere in a blog post[4]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "t37", # index name
"shard" : $shard,
"node" : "datanode15", # node name
"allow_primary" : true
}
}
]
}'
sleep 5
done
|
EDIT: Based on Vincent’s comment I updated the shell script:
Possible Unassigned Shard Reasons
FYI, these are the possible reasons for a shard be in a unassigned state[1]:
Name | Comment |
---|---|
INDEX_CREATED | Unassigned as a result of an API creation of an index |
CLUSTER_RECOVERED | Unassigned as a result of a full cluster recovery |
INDEX_REOPENED | Unassigned as a result of opening a closed index |
DANGLING_INDEX_IMPORTED | Unassigned as a result of importing a dangling index |
NEW_INDEX_RESTORED | Unassigned as a result of restoring into a new index |
EXISTING_INDEX_RESTORED | Unassigned as a result of restoring into a closed index |
REPLICA_ADDED | Unassigned as a result of explicit addition of a replica |
ALLOCATION_FAILED | Unassigned as a result of a failed allocation of the shard |
NODE_LEFT | Unassigned as a result of the node hosting it leaving the cluster |
REROUTE_CANCELLED | Unassigned as a result of explicit cancel reroute command |
REINITIALIZED | When a shard moves from started back to initializing, for example, with shadow replicas |
REALLOCATED_REPLICA | A better replica location is identified and causes the existing replica allocation to be cancelled |
References
- ElasticSearch Document Cat Shards
- ElasticSearch Document Synced Flush
- ElasticSearch Document Cluster Reroute
- How to fix your elasticsearch cluster stuck in initializing shards mode?
Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由的更多相关文章
- 运行jar提示“没有主清单属性”的解决方法
以下记录的是我export jar包后运行遇到问题的解决方法,如有错误,欢迎批评指正. 1.运行导出jar包,提示"没有主清单属性" 2.回想自己导出jar的操作是否有误,重新ex ...
- mysql插入报主键冲突,解决方法主键索引重新排序
1.备份表结构 create table table_bak like table_name; 2.备份表数据 insert into table_bak select * from table_na ...
- JAVA之中出现无法加载主类的情况解决方法
j今天打代码的时候出现了无法加载主类的情况,我就收集了一些,java无法加载主类的方法 ava无法加载主类解决办法 今天启动项目,又遇到找不到或无法加载主类的情况,清除项目后无法编译,class文件下 ...
- eclipse 导出jar 没有主清单属性的解决方法
eclipse编写导出的jar文件,运行出现了没有主清单属性,问题在哪里呢?有下面几种方法: 1. 导出jar文件的时候选择[可运行的jar文件]而不是[Jar文件]即可,如下图: 2. 在jar文件 ...
- KETTLE 主键不唯一解决方法
SELECT 某一列, COUNT( 某一列 ) FROM 表 GROUP BY 某一列 HAVING
- How to resolve unassigned shards in Elasticsearch——写得非常好
How to resolve unassigned shards in Elasticsearch 转自:https://www.datadoghq.com/blog/elasticsearch-un ...
- springboot打包成jar包后找不到xml,找不到主类的解决方法
springboot打包成jar包后找不到xml,找不到主类的解决方法 请首先保证你的项目能正常运行(即不打包的时候运行无误),我们在打包时经常遇到如下问题: springboot打包成jar包后找不 ...
- Recovering unassigned shards on elasticsearch 2.x——副本shard可以设置replica为0在设置回来
Recovering unassigned shards on elasticsearch 2.x 摘自:https://z0z0.me/recovering-unassigned-shards-on ...
- 如何在Elasticsearch中解析未分配的分片(unassigned shards)
一.精确定位到有问题的shards 1.查看哪些分片未被分配 curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unas ...
随机推荐
- js写发布微博文本框---2017-04-14
实现效果: 1.文本框输入内容,低端字数对应减少 2.当文本框内容超出时,会显示字数超出多少 效果图如下: 实现代码: <!DOCTYPE html><html> <he ...
- MVC 全局异常处理(适用多人操作)
自定义特性: using System; using System.Collections.Generic; using System.Linq; using System.Web; using Sy ...
- Session会在浏览器关闭后消失吗?
转 http://blog.csdn.net/rongwenbin/article/details/51784310 Cookie的两种类型 在项目开发中我们时常将需要在客户端(浏览器)缓存的数 ...
- FluentAPI配置
基本 EF 配置只要配置实体类和表.字段的对应关系.表间关联关系即可. 如何利用 EF的高级配置,达到更多效果:如果数据错误(比如字段不能为空.字符串超长等),会在 EF 层就会报错,而不会被提交给数 ...
- 微信小程序 | 小程序的转发问题
1.配置小程序页面静态转发信息 关于小程序转发问题,文档 在 page 页面填加了该监听函数,会在小程序右上角 ... 菜单中显示“转发”按钮: 监听函数需要 return {} 其中的内容配置转发信 ...
- JS装饰器模式
装饰器模式:在不改变原对象的基础上,通过对其进行包装拓展(添加属性或者方法),保护原有功能的完整性需要条件:原对象,新内容(属性/方法)个人理解:重新实现一下,原对象的方法,在方法内容,先执行原对象的 ...
- SQL Server死锁排查
1. 死锁原理 根据操作系统中的定义:死锁是指在一组进程中的各个进程均占有不会释放的资源,但因互相申请被其他进程所站用不会释放的资源而处于的一种永久等待状态. 死锁的四个必要条件:互斥条件(Mutua ...
- SQL Server中怎样可以从SELECT语句的结果集中删除重复行
首先要分析出现重复记录的原因,是不是有一些where条件没有加上,把该加的条件都加上如果还有结果集重复,考虑以下方法去重: 结果集中去除重复行可以使用函数[distinct]也可以使用分组语句[gro ...
- WebApp开发技巧大全
1.开发成本较低使用web开发技术就可以轻松的完成web app的开发 2.升级较简单升级不需要通知用户,在服务端更新文件即可,用户完全没有感觉 3.维护比较轻松和一般的web一样,维护比较简单,它其 ...
- vue 中判断向上滚动还是向下滚动
<script> export default { data(){ return{ i = 0 } }, mounted () { window.addEventListener('scr ...