ELK 性能(3) — 在 Docker 上运行高性能容错的 Elasticsearch 集群

介绍

在 Docker 上运行高性能容错的 Elasticsearch 集群

内容

通常熟悉的开发流程是：

开发环境（Dev）-> 测试环境（Test）-> 质量环境（QA）-> 生产环境（Production Environment）

我们遇到的问题通常是：

资源没有完全使用
过度预计服务器的数量
开发环境 ≠ 测试环境 ≠ 质量环境 ≠ 生产环境

解决方案是使用容器技术

Amazon(AWS)
Kubernetes
Docker
rkt
spoon.net

容器与虚拟机的区别

运行官方的 Elasticsearch 容器

默认运行

$ docker run -d elasticsearch

(== $docker run -d elasticsearch:latest)

指定运行版本

$ docker run -d elasticsearch:1.7

为运行实例指定名称

$ docker run --name es_1 -h es_master_1 elasticsearch

设置容器参数

指定容器内存为 2G

$ docker run -d -m 2G elasticsearch

禁止内存交换

$ docker run -d -m 2G --memory-swappiness=0 elasticsearch

指定具体使用的 CPU 核心

$ docker run -d --cpuset-cpus="1,3" elasticsearch

指定 CPU 周期及利用率

$ docker run -d --cpu-period=50000 --cpu-quota=25000 elasticsearch

创建自定义的镜像

Dockerfile:

FROM elasticsearch

ADD ./elasticsearch.yml /usr/share/elasticsearch/config/

构建镜像名

$ docker build -t devops/example

网络控制

指定端口

$ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch

绑定host

$ docker run -d elasticsearch -Dnetwork.publish_host=192.168.1.1

$ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch -Dnetwork.publish_host=192.168.1.1

$ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch -Dnetwork.publish_host=0.0.0.0

网络控制最佳实践

为 Elasticsearch 集群使用独立的网络

为容器使用统一的主机名

  $ docker run -d -h es_node_1 elasticsearch

仅对客户节点暴露 9200 和 9300 端口
Elasticsearch 数据（data）和客户（client）节点仅指向主节点（master）

处理存储

存储路径默认使用

/usr/share/elasticsearch/data

默认是不持久化的（not persisted）

重定向到系统目录

$ docker run -d -v  /opt/elasticsearch/data:/usr/share/elasticsearch/data elasticsearch

只使用数据（data）容器

数据节点 Docker 的容量

忽略 Union File System

可以在多个容器间共享

如果容器本身被删除，数据内容仍然得以保持

$ docker create -v /mnt/es/data:/usr/share/elasticsearch/data

             --name esdata elasticsearch

高可用集群

Master Only	Data Only	Data Only	Client Only

minimum_master_nodes = N/2 + 1

一些参数：

recovery.after.nodes

recovery.expected.nodes

cluster.routing.allocation.node_concurrent_recoveries

index.unassigned.node_left.delayed_timeout

index.priority

Docker 主节点

  $ docker run -d elasticsearch

  	-Dnode.master=true

     	-Dnode.data=false

  	-Dnode.client=false

Docker 客户节点

  $ docker run -d elasticsearch

     -Dnode.master=false

     -Dnode.data=false

     -Dnode.client=true

Docker 数据节点

  $ docker run -d elasticsearch

     -Dnode.master=false

     -Dnode.data=true

     -Dnode.client=false

集群伸缩

curl -XPUT 'http://localhost:9200/devops/' -d '{

 "settings" : {

  "index" : {

   "number_of_shards" : 4,

   "number_of_replicas" : 0

  }

 }

}'

curl -XPUT 'http://localhost:9200/devops/_settings' -d '{

 "index.number_of_replicas" : 1

}'

curl -XPUT 'http://localhost:9200/devops/_settings' -d '{

 "index.number_of_replicas" : 2

}'

curl -XPUT 'http://localhost:9200/devops/_settings' -d '{

 "index.number_of_replicas" : 1

}'

如果移除 2 个节点

如果仅保留 1 个节点

RAM 缓冲

indices.memory.index_buffer_size: 10%

indices.memory.min_index_buffer_size: 48mb

indices.memory.max_index_buffer_size (unbounded)

indices.memory.min_shard_index_buffer_size: 4mb

值越小，吞吐量越小；值越大，吞吐量越大。

时间相关的数据

仅用 TODAY 与 WEEK 作为关键字进行搜索

curl -XPOST 'http://localhost:9200/_aliases' -d '{

 "actions" : [

  { "add" : {"index":"2015-11-23","alias":"today"} },

  { "add" : {"index":"2015-11-23","alias":"week"} }

]}'

在 Elasticsearch 内部，Lucene 存在一种段合并（segment merge）的机制，索引内数据越多，Lucene 里面会创建更多的段（segment），多个小的段合并可以为我们提高搜索的性能。段合并的本质实际上就是移动和拷贝数据。这也意味着需要更多的 I/O 与 CPU ，此时会降低 Elasticsearch 的性能。

多层结构

另外一种处理时间相关的数据可以通过冷热标签来实现。

node.tag=hot	node.tag=cold	node.tag=cold

curl -XPUT 'localhost:9200/data_2015-11-23' -d '{

  "settings": {

    "index.routing.allocation.include.tag" : "hot"

  }

}'

node.tag=hot	node.tag=cold	node.tag=cold

curl -XPUT 'localhost:9200/data_2015-11-23/_settings' -d '{

  "settings": {

    "index.routing.allocation.exclude.tag" : "hot",

    "index.routing.allocation.include.tag" : "cold",

  }

}'

node.tag=hot	node.tag=cold	node.tag=cold

node.tag=hot	node.tag=cold	node.tag=cold

node.tag=hot	node.tag=cold	node.tag=cold

但是此种方法在多租条件下是无效的

路由

索引

无路由

有路由

查询

无路由

有路由

有路由与无路由查询的性能对比

性能监控

https://github.com/sematext/spm-agent-docker

参考

参考来源:

2016.1 Rafał Kuć - Running High Performance And Fault Tolerant Elasticsearch Clusters On Docker

2015.11 Presentation: Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker

结束