Elasticsearch系列---生产集群的索引管理

概要

索引是我们使用Elasticsearch里最频繁的部分日常的操作都与索引有关，本篇从运维人员的视角，来玩一玩Elasticsearch的索引操作。

基本操作

在运维童鞋的视角里，索引的日常操作除了CRUD，还是打开关闭、压缩、alias重置，我们来了解一下。

创建索引

[esuser@elasticsearch02 ~]$curl -XPUT 'http://elasticsearch02:9200/music?pretty' -H 'Content-Type: application/json' -d '

{

    "settings" : {

        "index" : {

            "number_of_shards" : 3,

            "number_of_replicas" : 2

        }

    },

    "mappings" : {

        "type1" : {

            "properties" : {

                "name" : { "type" : "text" }

            }

        }

    }

}'

{

    "acknowledged": true,

    "shards_acknowledged": true

}

默认情况下，索引创建命令会在每个primary shard的replica shard 开始进行复制后，或者是请求超时之后，返回响应消息，如上。

acknowledged表示这个索引是否创建成功，shards_acknowledged表明了每个primary shard有没有足够数量的replica开始进行复制。

可能这两个参数会为false，但是索引依然可以创建成功。因为这些参数仅仅是表明在请求超时之前，这两个操作有没有成功，也有可能请求超时了，在超时前都没成功，但是实际上Elasticsearch Server端接收到了消息，并且都执行了，只是响应前还没来得及执行，所以响应的是false。

删除索引

curl -XDELETE 'http://elasticsearch02:9200/music?pretty'

查询索引设置信息

curl -XGET 'http://elasticsearch02:9200/music?pretty'

打开/关闭索引

curl -XPOST 'http://elasticsearch02:9200/music/_close?pretty'

curl -XPOST 'http://elasticsearch02:9200/music/_open?pretty'

如果一个索引关闭了，那么这个索引就没有任何的性能开销了，只要保留这个索引的元数据即可，然后对这个索引的读写操作都不会成功。一个关闭的索引可以接着再打开，打开以后会进行shard recovery过程。

如果集群数据定时有备份，在执行恢复的操作之前，必须将待恢复的索引关闭，否则恢复会报失败。

压缩索引

我们知道索引的primary shard数量在创建时一旦指定，后期就不能修改了，但是有一个这样的情况：预估的shard数量在实际生产之后，发现估算得有点高，比如原来设置number_of_shards为8，结果生产上线后发现数据量没那么大，我想把这个索引的primary shard压缩一下，该如何操作呢？

shrink命令的作用就是对索引进行压缩的，不过有个限制：压缩后的shard数量必须可以被原来的shard数量整除。如我们的8个primary shard的index可以只能被压缩成4个，2个，或者1个primary shard的index。

shrink命令的工作流程：

创建一个跟source index的定义一样的target index，但是唯一的变化就是primary shard变成了指定的数量。
将source index的segment file直接用hard-link的方式连接到target index的segment file，如果操作系统不支持hard-link，那么就会将source index的segment file都拷贝到target index的data dir中，会很耗时。如果用hard-link会很快。
target index进行shard recovery恢复。

案例演示

我们创建一个number_of_shards为8的索引，名称为music8

curl -XPUT 'http://elasticsearch02:9200/music8?pretty' -H 'Content-Type: application/json' -d '

{

    "settings" : {

        "index" : {

            "number_of_shards" : 8,

            "number_of_replicas" : 2

        }

    },

    "mappings" : {

        "children" : {

            "properties" : {

                "name" : { "type" : "text" }

            }

        }

    }

}'

在索引内灌点数据进去
将索引的shard都移到一个node上去，如node1

curl -XPUT 'http://elasticsearch02:9200/music8/_settings?pretty' -H 'Content-Type: application/json' -d '

{

  "settings": {

    "index.routing.allocation.require._name": "node-1",

    "index.blocks.write": true

  }

}'

这个过程叫shard copy relocate，使用

`curl -XGET 'http://elasticsearch02:9200/_cat/recovery?v'

可以查看该过程的进度。

执行shrink命令，新的索引名称为music9

curl -XPOST 'http://elasticsearch02:9200/music8/_shrink/music9?pretty' -H 'Content-Type: application/json' -d '

{

  "settings": {

	"index.number_of_shards": 2,

    "index.number_of_replicas": 1,

    "index.codec": "best_compression"

  }

}'

执行完成后，可以看到music9的shard数据变化了，并且拥有music8所有的数据。

将别名指向新的music9索引，客户端访问无感知。

rollover索引

我们最常见的日志索引，需要每天创建一个新的带日期的索引，但客户端又使用同一个alias进行写入，此时可以用rollover命令将alias重置到这个新的索引上。

假设log_write别名已经存在，示例命令：

curl -XPOST 'http://elasticsearch02:9200/log_write/_rollover/log-20120122

-H 'Content-Type: application/json' -d '

{

  "conditions": {

    "max_age":   "1d"

  }

}'

用crontab定时每天执行一次，并且将日期部分用shell脚本进行参数化，这样每天都创建一个带日期的索引名字，而客户端那边一直使用log_write别名作写入操作，对日志系统非常实用。

索引mapping管理

索引的mapping管理是非常基础的操作，我们可以在创建索引时定义mapping信息，也可以在索引创建成功后执行增加字段操作。

列举以下几个常用示例：

查看索引的mapping信息

curl -XGET 'http://elasticsearch02:9200/music/_mapping/children?pretty'

查看索引指定field的mapping信息

curl -XGET 'http://elasticsearch02:9200/music/_mapping/children/field/content?pretty'

创建索引时带上mapping信息

# 节省篇幅，省略大部分字段

curl -XPUT 'http://elasticsearch02:9200/music?pretty' -H 'Content-Type: application/json' -d '

{

  "mappings": {

    "children": {

      "properties": {

        "content": {

          "type": "text",

          "fields": {

            "keyword": {

              "type": "keyword",

              "ignore_above": 256

            }

          }

		}

      }

    }

  }

}'

为索引增加一个字段name，类型为text

curl -XPUT 'http://elasticsearch02:9200/music/_mapping/children?pretty' -H 'Content-Type: application/json' -d '

{

  "properties": {

    "name": {

      "type": "text"

    }

  }

}'

索引别名

客户端访问Elasticsearch的索引时，规范化操作都不会直接使用索引名称，而是使用索引别名，索引别名能够起到封装Elasticsearch真实索引的作用，像上面的rollover操作，索引重建操作，别名起到了非常关键的作用。

我们来简单看一下索引的基本操作：

# 创建索引别名

curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '

{

    "actions" : [

        { "add" : { "index" : "music", "alias" : "music_prd" } }

    ]

}'

# 删除索引别名

curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '

{

    "actions" : [

        { "remove" : { "index" : "music", "alias" : "music_prd" } }

    ]

}'

# 重命名别名：先删掉后添加

curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '

{

    "actions" : [

        { "remove" : { "index" : "music", "alias" : "music_prd" } },

        { "add" : { "index" : "music2", "alias" : "music_prd" } }

    ]

}'

# 多个索引绑定一个别名

curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '

{

    "actions" : [

        { "add" : { "indices" : ["music1", "music2"], "alias" : "music_prd" } }

    ]

}'

索引setting修改

查看索引setting信息：

curl -XGET 'http://elasticsearch02:9200/music/_settings?pretty'

修改setting信息：

curl -XPUT 'http://elasticsearch02:9200/music/_settings?pretty' -H 'Content-Type: application/json' -d '

{

    "index" : {

        "number_of_replicas" : 1

    }

}'

setting最常见的修改项就是replicas的数量，其他的参数修改的场景不是特别多。

索引template

假设我们正在设计日志系统的索引结构，日志数据量较大，可能每天创建一个新的索引，索引名称按日期标记，但别名是同一个，这种场景就比较适合使用index template。

我们举个示例，先创建一个索引模板：

curl -XPUT 'http://elasticsearch02:9200/_template/template_access_log?pretty' -H 'Content-Type: application/json' -d '

{

  "template": "access-log-*",

  "settings": {

    "number_of_shards": 2

  },

  "mappings": {

    "log": {

      "_source": {

        "enabled": false

      },

      "properties": {

        "host_name": {

          "type": "keyword"

        },

		"thread_name": {

          "type": "keyword"

        },

        "created_at": {

          "type": "date",

          "format": "YYYY-MM-dd HH:mm:ss"

        }

      }

    }

  },

  "aliases" : {

      "access-log" : {}

  }

}'

索引名称符合"access-log-*"将使用该模板，我们创建一个索引：

curl -XPUT 'http://elasticsearch02:9200/access-log-01?pretty'

查看该索引：

curl -XGET 'http://elasticsearch02:9200/access-log-01?pretty'

可以看到如下结构：

[esuser@elasticsearch02 bin]$ curl -XGET 'http://elasticsearch02:9200/access-log-01?pretty'

{

  "access-log-01" : {

    "aliases" : {

      "access-log" : { }

    },

    "mappings" : {

      "log" : {

        "_source" : {

          "enabled" : false

        },

        "properties" : {

          "created_at" : {

            "type" : "date",

            "format" : "YYYY-MM-dd HH:mm:ss"

          },

          "host_name" : {

            "type" : "keyword"

          },

          "thread_name" : {

            "type" : "keyword"

          }

        }

      }

    },

    "settings" : {

      "index" : {

        "creation_date" : "1581373546223",

        "number_of_shards" : "2",

        "number_of_replicas" : "1",

        "uuid" : "N8AHh3wITg-Zh4T6umCS2Q",

        "version" : {

          "created" : "6030199"

        },

        "provided_name" : "access-log-01"

      }

    }

  }

}

说明使用了模板的内容。

当然也有命令可以查看和删除template:

curl -XGET 'http://elasticsearch02:9200/_template/template_access_log?pretty'

curl -XDELETE 'http://elasticsearch02:9200/_template/template_access_log?pretty'

索引常用查询

索引操作统计查询

发生在索引上的所有CRUD操作，Elasticsearch都是会做统计的，而且统计的内容非常翔实，我们可以使用这条命令：

curl -XGET 'http://elasticsearch02:9200/music/_stats?pretty'

内容非常详细，有好几百行，从doc的数据和占用的磁盘字节数，到get、search、merge、translog等底层数据应有尽有。

segment信息查询

索引下的segment信息，可以使用这条命令进行查询：

curl -XGET 'http://elasticsearch02:9200/music/_segments?pretty'

内容也同样挺多，我们摘抄出关键的部分做个示例：

"segments" : {

  "_1" : {

    "generation" : 1,

    "num_docs" : 1,

    "deleted_docs" : 0,

    "size_in_bytes" : 7013,

    "memory_in_bytes" : 3823,

    "committed" : true,

    "search" : true,

    "version" : "7.3.1",

    "compound" : true,

    "attributes" : {

      "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"

    }

  }

}

这个片段表示名称为_1的segment的信息。详细如下：

_1：segment的名称
generation：segment的自增长ID
num_docs：segment中没有被删除的document的数量
deleted_docs：segment中被删除的document数量
size_in_bytes：segment占用的磁盘空间
memory_in_bytes：segment会将一些数据缓存在内存中，这个数值就是segment占用的内存的空间大小
committed：segment是否被sync到磁盘上去了
search：segment是否可被搜索，如果这个segment已经被sync到磁盘上，但是还没有进行refresh，值为false
version：lucene的版本号
compound：true表示lucene已将这个segment所有的文件都merge成了一个文件

shard存储信息

查看索引下shard的存储情况，分布在哪个node上，这条命令还是挺有用处的:

curl -XGET 'http://elasticsearch02:9200/music/_shard_stores?status=green&pretty'

摘抄了一个片段，3表示shard的id：

"3" : {

  "stores" : [

    {

      "A1s1uus7TpuDSiT4xFLOoQ" : {

        "name" : "node-2",

        "ephemeral_id" : "Q3uoxLeJRnWQrw3E2nOq-Q",

        "transport_address" : "192.168.17.137:9300",

        "attributes" : {

          "ml.machine_memory" : "3954196480",

          "rack" : "r1",

          "xpack.installed" : "true",

          "ml.max_open_jobs" : "20",

          "ml.enabled" : "true"

        }

      },

      "allocation_id" : "o-t-AwGZRrWTflYLP030jA",

      "allocation" : "primary"

    },

    {

      "RGw1IXzZR4CeZh9FUrGHDw" : {

        "name" : "node-1",

        "ephemeral_id" : "B1pv6c4TRuu1vQNvL40iPg",

        "transport_address" : "192.168.17.138:9300",

        "attributes" : {

          "ml.machine_memory" : "3954184192",

          "rack" : "r1",

          "ml.max_open_jobs" : "20",

          "xpack.installed" : "true",

          "ml.enabled" : "true"

        }

      },

      "allocation_id" : "SaXqL8igRUmLAoBBQyQNqw",

      "allocation" : "replica"

    }

  ]

},

补充几个操作

清空索引缓存

curl -XPOST 'http://elasticsearch02:9200/music/_cache/clear?pretty'

强制flush

强行将os cache里的数据强制fsync到磁盘上去，同时还会清理掉translog中的日志

curl -XPOST 'http://elasticsearch02:9200/music/_flush?pretty'

refresh操作

显式地刷新索引，让在自动refresh前的所有操作变成可见

curl -XPOST 'http://elasticsearch02:9200/music/_flush?pretty'

force merge

强制合并segment file，可以减小segment的数量

curl -XPOST 'http://elasticsearch02:9200/music/_forcemerge?pretty'

以上4个操作，一般是由Elasticsearch自动去执行，非特殊情况下不需要人工干预。

小结

本篇从运维角度简单介绍了一下索引的一些日常操作与管理，能够熟练应用的话，可以提升操纵索引的效率。

专注Java高并发、分布式架构，更多技术干货分享与心得，请关注公众号：Java架构社区

可以扫左边二维码添加好友，邀请你加入Java架构社区微信群共同探讨技术