Recovering unassigned shards on elasticsearch 2.x——副本shard可以设置replica为0在设置回来
Recovering unassigned shards on elasticsearch 2.x
摘自:https://z0z0.me/recovering-unassigned-shards-on-elasticsearch/
I got accross the problem when decided to add a node to the elasticsearch cluster and that node was not able to replicate the indexes of the cluster. This issue is usually happens when there is not enough disk space available, or not available master or different elasticsearch version. While my servers had more than enough disk space and also the master was available with the help of the elasticsearch discuss I found out that the new node was having a different version than old nodes. Basically while installing on Debian jessie I just run apt-get install elasticsearch which ended up installing the latest available version. To install a specific version of the elasticsearch you prety much need to add ={version}.
#apt-get install elasticsearch={version}
Now that I have identified the reasons for unallocated shards and successfully downgraded the elasticsearch to the required version by running the command above after starting the node the cluster was still in red state with unassigned shards all over the place:
#curl http://localhost:9200/_cluster/health?pretty
 {
   "cluster_name" : "z0z0",
   "status" : "red",
   "timed_out" : false,
   "number_of_nodes" : 3,
   "number_of_data_nodes" : 3,
   "active_primary_shards" : 6,
   "active_shards" : 12,
   "relocating_shards" : 0,
   "initializing_shards" : 0,
   "unassigned_shards" : 8,
   "delayed_unassigned_shards" : 0,
   "number_of_pending_tasks" : 0,
   "number_of_in_flight_fetch" : 0,
   "task_max_waiting_in_queue_millis" : 0,
   "active_shards_percent_as_number" : 60.0
 }
#curl http://localhost:9200/_cat/shards
site-id      4 p UNASSIGNED
site-id      4 r UNASSIGNED
site-id      1 p UNASSIGNED
site-id      1 r UNASSIGNED
site-id      3 p STARTED    0 159b 10.0.0.6 node-2
site-id      3 r STARTED    0 159b 10.0.0.7 node-3
site-id      2 r STARTED    0 159b 10.0.0.6 node-2
site-id      2 p STARTED    0 159b 10.0.0.7 node-3
site-id      0 r STARTED    0 159b 10.0.0.6 node-2
site-id      0 p STARTED    0 159b 10.0.0.7 node-3
subscription 4 p UNASSIGNED
subscription 4 r UNASSIGNED
subscription 1 p UNASSIGNED
subscription 1 r UNASSIGNED
subscription 3 p STARTED    0 159b 10.0.0.6 node-2
subscription 3 r STARTED    0 159b 10.0.0.7 node-3
subscription 2 r STARTED    0 159b 10.0.0.6 node-2
subscription 2 p STARTED    0 159b 10.0.0.7 node-3
subscription 0 p STARTED    0 159b 10.0.0.6 node-2
subscription 0 r STARTED    0 159b 10.0.0.7 node-3
At this point I was pretty desperate and whatever I tried it either did not do anything or ended up in all kind of failures. So I set the number_of_replicas to 0 by running the following query:
#curl -XPUT http://localhost:9200/_settings?pretty -d '
{
  "index" : {
    "number_of_replicas' : 0
  }
}'
and started to stop the nodes one by one until I was having only one live node. 
At this point I decided to start trying to reroute the unassigned shards and if it won't work I would just start over my cluster. So I did run the following:
#curl -XPOST -d '
{
  "commands" : [ {
    "allocate" : {
      "index" : "site-id",
      "shard" : 1,
      "node" : "node-3",
      "allow_primary" : true
    }
  } ]
}' http://localhost:9200/_cluster/reroute?pretty
I've seen that the rerouted shard became initialized then running so I did the same command on the rest of unassigned shards. 
Running curl http://localhost:9200/_cluster/health?pretty confirmed that I am on the good track to fix the cluster.
#curl http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "z0z0",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 10,
  "active_shards" : 20,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
So the cluster was green again but was running out of one node. So it was time to bring up the other nodes one by one. When all the nodes were up I set the number_of_replicas to 1 by running the following:
#curl -XPUT http://localhost:9200/_settings -d '
{
  "index" : {
    "number_of_replicas" : 1
  }
}'
So my elasticsearch cluster is back on running 3 nodes and still in green state. After alot of googling and wasted time I decided to write this article so that if anyone would come accross this issue would have a working example of how to fix it.
Recovering unassigned shards on elasticsearch 2.x——副本shard可以设置replica为0在设置回来的更多相关文章
- How to resolve unassigned shards in Elasticsearch——写得非常好
		How to resolve unassigned shards in Elasticsearch 转自:https://www.datadoghq.com/blog/elasticsearch-un ... 
- 如何在Elasticsearch中解析未分配的分片(unassigned shards)
		一.精确定位到有问题的shards 1.查看哪些分片未被分配 curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unas ... 
- Elasticsearch分片、副本与路由(shard replica routing)
		本文讲述,如何理解Elasticsearch的分片.副本和路由策略. 1.预备知识 1)分片(shard) Elasticsearch集群允许系统存储的数据量超过单机容量,实现这一目标引入分片策略sh ... 
- Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由
		Red Cluster! 摘自:http://blog.kiyanpro.com/2016/03/06/elasticsearch/reroute-unassigned-shards/ There a ... 
- Kafka副本管理—— 为何去掉replica.lag.max.messages参数
		今天查看Kafka 0.10.0的官方文档,发现了这样一句话:Configuration parameter replica.lag.max.messages was removed. Partiti ... 
- NoSQL数据库Mongodb副本集架构(Replica Set)高可用部署
		NoSQL数据库Mongodb副本集架构(Replica Set)高可用部署 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. MongoDB 是一个基于分布式文件存储的数据库.由 C ... 
- Hadoop 副本放置策略的源码阅读和设置
		本文通过MetaWeblog自动发布,原文及更新链接:https://extendswind.top/posts/technical/hadoop_block_placement_policy 大多数 ... 
- Elasticsearch节点下线(退役)and unassigned shards
		一.节点退役当集群中个别节点出现故障预警等情况,需要进行退役工作,即让所有位于该退役节点上的分片的数据分配到其他节点上后,再将此节点关闭并从集群中移除. 1.ES提供了让某个节点上所有数据都移走的功能 ... 
- 单节点 Elasticsearch 出现 unassigned shards 原因及解决办法
		根本原因: 是因为集群存在没有启用的副本分片,我们先来看一下官网给出的副本分片的介绍: 副本分片的主要目的就是为了故障转移,正如在 集群内的原理 中讨论的:如果持有主分片的节点挂掉了,一个副本分片就会 ... 
随机推荐
- iOS:编译错误[__NSDictionaryM objectAtIndexedSubscript:]: unrecognized selector sent to instance 0xa79e61
			这个意思是,__NSDictionaryM 无法将值传到下标索引对象,言简意赅就是数组越界.可是再看看,这是数组吗?不是,所以.遇到这样的crash,我这里有两种情况: 1.首先看看你 indexP ... 
- EOJ 3124 单词表
			题目描述 提取英文文本中的单词,重复出现的单词只取一个,把它们按照字典顺序排序,建立为一个单词表. 例如:英文文本如下: “ask not what your country can do for y ... 
- BZOJ 4710 容斥原理+dp
			//By SiriusRen #include <cstdio> using namespace std; int n,m,a[1005]; typedef long long ll; l ... 
- USACO Sabotage, 2014 Mar 破坏阴谋(二分+贪心)
			一开始看完这题就有个想法: 只要把大于整个序列平均数的最大连续序列就是最优? 那把整个序列都减掉平均数 在做最大连续字序列和且记录长度? 仔细思考一下并不太对: 当子序列最大但长度较大 也许也比不上删 ... 
- JEE Spring-boot  简单的ioc写法。
			什么是ioc,就是你可能会有一些生活必需品,这些东西你必须要用才能存活.但是你不是每天都回去买,去哪一家点去买.而这些用品会一直放在哪里,每一个商店就是一个容器,包裹着这些物品. 创建ioc项目,首先 ... 
- DB2数据库load出现SQL3508N问题
			SQL3508N装入或装入查询期间,当存取类型为 "<文件类型>"的文件或路径时出错.原因码:"<原因码>".路径:"< ... 
- ZBrush中如何反选遮罩
			通过对ZBrush的学习,我们知道了如何手动创建遮罩,手动创建遮罩相对来说是最简单有效的方法,在某些特定的使用场合会起到事半功倍的效果.创建遮罩我们可以结合Ctrl键在物体保持编辑的状态下来执行,您可 ... 
- luogu P5290 [十二省联考2019]春节十二响 优先队列_启发式合并
			思维难度不大,在考上上写的启发式合并写错了,只拿了 60 pts,好难过QAQ 没什么太难的,在考场上想出链的部分分之后很容易就能想到正解.没错,就是非常短的启发式合并.注意一下,写的要漂亮一点,否则 ... 
- CF992E Nastya and King-Shamans_线段树
			Code: #include<cstdio> #include<algorithm> using namespace std; const int maxn = 200000 ... 
- Day 03 知识点[python程序运行的方式、变量、注释、内存管理、数据类型]
			执行Python程序的两种方式 第一种:交互式,在cmd中运行 优点:调试程序方便,直接给出结果 缺点:无法保存,关掉cmd窗口数据就消失 第二种:命令行式通过cmd中输入Python3文本 优点:数 ... 
