Support in the Wild: My Biggest Elasticsearch Problem at Scale

Java Heap Pressure

Elasticsearch has so many wildly different use cases that I could not write a reasonably short blog post describing what can and cannot consume memory. However, there is one thing that constantly stands out above all of the other concerns that you might have while running an Elasticsearch cluster at scale.

For the users that I help, fielddata is the problem that is the most likely to cause their cluster's instability. Fielddata is the bane of my existence and it's the most frequent cause of the highest severity issues that I handle with our customers.

Understanding Fielddata

The inverted index is the magic that makes Elasticsearch queries so fast. This data structure holds a sorted list of all the unique terms that appear in a field, and each term points to the list of documents that contain that term:

Term:    Docs: 1  2  3  4  5
----------------------------
brown X X X
fox X X
quick X X
----------------------------

Search asks the question: What documents contain term brown in the foo field? The inverted index is the perfect structure to answer this question: look up the term of interest in the sorted list and you immediately know which documents match your query.

Sorting or aggregations, however, need to be able to answer this question: What terms does Document 1 contain in the foo field? To answer this, we need a data structure that is the opposite of the inverted index:

Docs:   Terms:
----------------------------
1 [ brown ]
2 [ quick ]
3 [ brown ]
4 [ brown, fox, quick ]
5 [ fox ]
----------------------------

This is the purpose of fielddata. Fielddata can be generated at query time by reading the inverted index, inverting the term <-> doc data structure, and storing the results in memory.(就是doc->terms的正向索引啊,不过它是在查询阶段通过读取倒排索引得到的?如果真是这样,那么如何能够比doc values更快?) The two major downsides of this approach should be obvious:

  1. Loading fielddata can be slow, especially with big segments.
  2. It consumes a lot of valuable heap space.

Because loading fielddata is costly, we try to do it as seldom as possible. Once loaded, we keep it in memory for as long as possible.

By default, fielddata is loaded on demand, which means that you will not see it until you are using it. Also, by being loaded per segment, it means that new segments that get created will slowly add to your overall memory usage until the field's fielddata is evicted from memory. Eviction happens in only a few ways:

  1. Deleting the index or indices that contains it.
  2. Closing the index or indices that contains it.
  3. Segment fielddata is removed when segments are removed (e.g., background merging).
    • This usually just means that the problem is moving rather than going away.
  4. Restarting the node containing the fielddata.
  5. Clearing the relevant fielddata cache.
  6. Automatically evicting the fielddata to make room for other fielddata.
    • This will not happen with default settings.

While the first two ways will cause the memory to be evicted, they're not useful in terms of solving the problem because they make the index unusable. Segment merging is happening in the background and it is not a way to clear fielddata. The fourth and fifth ways are unlikely to be a long term solution because they do not prevent fielddata from being reloaded.

The sixth option, evicting fielddata when the cache is full, leads to different issues: one request triggers fielddata loading for one field and the next request triggers loading for another, causing the first field to be evicted. This causes memory thrashing and slow garbage collections, and your users suffer from very slow queries while they wait for their fielddata to be loaded.

Simply put, once fielddata becomes a problem, then it stays a problem.

Why Fielddata is Bad

At small scales, you can generally get away with fielddata usage without even realizing that you are using it. In highly controlled environments, you may even enjoy that specific fields are being loaded into memory for theoretically faster access.

However, almost without fail, you are bound to run into a problem with it eventually. Whether it's because someone ran a test request on the production system without thinking that it would be a problem (it's just one query, right?), your queries changed to match new data, or you just finally reached a scale where it no longer works: you will eventually run into memory pressure that does not go away.

As I noted earlier, fielddata does not go away on its own. In Elasticsearch 1.3 and later, we allow up to 60% of your Java heap's memory to be consumed by fielddata per node. We control this via the Fielddata Circuit Breaker, which checks incoming requests for potential fielddata usage and then blocks them if they require more memory than is currently available. Any circuit breaker's purpose is to prevent any bad requests, which means that it never gets the chance to cause a problem (e.g., allocate even more fielddata), but it's important to note that it will not clear any existing fielddata.

For example, if a node has 10 GBs of Java heap, then 60% of that is going to be 6 GBs. If a new request requires 1 GB of fielddata to be loaded for that node that is already using 4 GBs of the heap for fielddata, then it will allow it because 4 GB, plus 1 GB, is less than 6 GB. However, if the next request needed 2 GB for yet another field's fielddata, then the entire request would be rejected because the fielddata is exhausted (5 GB + 2 GB = 7 GB, which is clearly greater than 6 GB).

Note: for versions prior to Elasticsearch 1.3, we allowed an unlimited amount of your Java heap to be consumed by fielddata.

Finding Your Fielddata

Fortunately, it's not all bad news. Not only do we have a solution to the problem, but we also provide a way to find and understand your problem with it.

$ curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

This will provide a list of each node with its fielddata usage. For instance, at startup, my local node is using absolutely no fielddata:

id                     host            ip          node  total
iExRFXn1Qw23iRzhwor-Wg Chriss-MBP.home 192.168.1.2 WallE 0b

To see it change, it's as simple as sorting, scripting, or aggregating any field. So let's do all three!

$ curl -XGET localhost:9200/test-index/_search -d '{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc[\"percentage\"].value > 0.5"
}
}
}
},
"aggs": {
"max_number": {
"max": {
"field": "number"
}
}
},
"sort": [
{
"@timestamp": {
"order": "desc"
}
}
]
}'

Although order is irrelevant for this, the first field that will be impacted will be the percentage field that is accessed inside of the scripted filter. The second field used will be the number field from the aggregation. Finally, the last field is the @timestamp field used to sort the filtered results. Taking another look at the _cat/fielddata command above confirms this:

id                     host            ip          node   total number @timestamp percentage
iExRFXn1Qw23iRzhwor-Wg Chriss-MBP.home 192.168.1.2 WallE 49.9kb 24.8kb 24.8kb 192b

Use Doc Values

The solution to this fielddata problem is to avoid it altogether. Fortunately, you can avoid the use of fielddata bymanually mapping all of your fields to use doc values. Without repeating too much from the guide, doc values offload this burden by writing the fielddata to disk at index time, thereby allowing Elasticsearch to load the values outside of your Java heap as they are needed.

By taking the burden out of your heap, you get fast access to the on-disk fielddata through the file system cache, which gives in-memory performance without the cost of garbage collections coming into play. This also frees up a lot of headroom for the Elasticsearch heap so that more operations (e.g., bulk indexing and concurrent searches) can use the heap without placing the node under memory pressure, which leads to garbage collection that will slow it down.

ES doc_values的来源,field data——就是doc->terms的正向索引啊,不过它是在查询阶段通过读取倒排索引loading segments放在内存而得到的?的更多相关文章

  1. ES系列八、正排索Doc Values和Field Data

    1.Doc Values 聚合使用一个叫Doc Values的数据结构.Doc Values使聚合更快.更高效且内存友好. Doc Values的存在是因为倒排索引只对某些操作是高效的.倒排索引的优势 ...

  2. ES doc_values介绍——本质是field value的列存储,做聚合分析用,ES默认开启,会占用存储空间(列存储压缩技巧,除公共除数或者同时减去最小数,字符串压缩的话,直接去重后用数字ID压缩)

    doc_values Doc values are the on-disk data structure, built at document index time, which makes this ...

  3. ES doc_values介绍2——本质是field value的列存储,做聚合分析用,ES默认开启,会占用存储空间

    一.doc_values介绍 doc values是一个我们再三重复的重要话题了,你是否意识到一些东西呢? 搜索时,我们需要一个“词”到“文档”列表的映射 排序时,我们需要一个“文档”到“词“列表的映 ...

  4. 2017 ES GZ Meetup分享:Data Warehouse with ElasticSearch in Datastory

    以下是我在2017 ES 广州 meetup的分享 ppt:https://elasticsearch.cn/slides/11#page=22 摘要 ES最多使用的场景是搜索和日志分析,然而ES强大 ...

  5. Elasticsearch由浅入深(十)搜索引擎:相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据

    相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...

  6. es修改指定的field(partial update)

    PUT /index/type/id 创建文档&替换文档,就是一样的语法一般对应到应用程序中,每次的执行流程基本是这样的:1.应用程序发起一个get请求,获取到document,展示到前台界面 ...

  7. doc_values VS stored field

    doc_values 按列存储,按docId排序,在query阶段使用,直接根据docId获取具体field的value,用来排序,聚合等. stored field按文档存储,按docId排序,一条 ...

  8. Jasper_filter data_pass field data from main to sub to filter some data

    main report: 1 add variable <variable name="Variable_rule" class="java.lang.String ...

  9. FNDLOAD Commands to Download Different Seed Data Types. (DOC ID 274667.1)

    In this Document Goal Solution References Applies to: Oracle Application Object Library - Version 11 ...

随机推荐

  1. PHP 学习内容

    第一阶段: (PHP+MySQL核心编程) 面向对象编程 MySQL数据库, MySQL的优化细节. HTTP协议,http也是我们web开发的基石.对我们了解PHP底层机制有很大帮助,做到知其然,还 ...

  2. 图像处理之基础---傅立叶c实现

    http://blog.csdn.net/lzhq28/article/details/7847047 http://blog.csdn.net/lishizelibin/article/detail ...

  3. Spring Cloud 微服务六:调用链跟踪Spring cloud sleuth +zipkin

    前言:随着微服务系统的增加,服务之间的调用关系变得会非常复杂,这给运维以及排查问题带来了很大的麻烦,这时服务调用监控就显得非常重要了.spring cloud sleuth实现了对分布式服务的监控解决 ...

  4. PHP如何进阶,提升自己

    2017年6月15日14:32:51 今天看今日头条,刷到了一个话题?是:整天增删改查调接口,PHP程序员,如何突破职业瓶颈晋级? 晋级包括:职位晋级:技术能力晋级.当你的技术能力晋级了,职位晋级也就 ...

  5. USB设备驱动程序(二)

    首先我们来看USB设备描述符的结构: 在USB总线识别设备阶段就将USB描述符发送给了USB总线驱动程序,设备的数据传输对象是端点,端点0是特殊端点,在USB总线驱动程序识别阶段, 会分配一个地址给U ...

  6. 3123: [Sdoi2013]森林

    3123: [Sdoi2013]森林 Time Limit: 20 Sec  Memory Limit: 512 MBSubmit: 3336  Solved: 978[Submit][Status] ...

  7. Java中线程和线程池

    Java中开启多线程的三种方式 1.通过继承Thread实现 public class ThreadDemo extends Thread{ public void run(){ System.out ...

  8. nginx学习之详细安装篇(二)

    1. 选择稳定版还是主线版 主线版:包含最新的功能和bug修复,但该版本可能会含有一些属于实验性的模块,同时可能会有新的bug,所以如果只是做测试使用,可以使用主线版. 稳定版:不包含最新的功能,但修 ...

  9. 我的Android进阶之旅------>FastJson的简介

    在最近的工作中,在客户端和服务器通信中,需要采用JSON的方式进行数据传输.简单的参数可以通过手动拼接JSON字符串,但如果请求的参数过多,采用手动拼接JSON字符串,出错率就非常大了.并且工作效率也 ...

  10. linux 创建账户

    linux下创建用户 linux下创建用户(一) Linux 系统是一个多用户多任务的分时操作系统,不论什么一个要使用系统资源的用户,都必须首先向系统管理员申请一个账号,然后以这个账号的身份进入系统. ...