elasticSearch 的中文文档 http://es.xiaoleilu.com/010_Intro/05_What_is_it.html

https://www.elastic.co/blog/index-vs-type

Who has never wondered whether new data should be put into a new type of an existing index, or into a new index? This is a recurring question for new users, that can’t be answered without understanding how both are implemented.

In the past we tried to make elasticsearch easier to understand by building an analogy with relational databases: indices would be like a database, and types like a table in a database. This was a mistake: the way data is stored is so different that any comparisons can hardly make sense, and this ultimately led to an overuse of types in cases where they were more harmful than helpful.

What is an index?

An index is stored in a set of shards, which are themselves Lucene indices. This already gives you a glimpse of the limits of using a new index all the time: Lucene indices have a small yet fixed overhead in terms of disk space, memory usage and file descriptors used. For that reason, a single large index is more efficient than several small indices: the fixed cost of the Lucene index is better amortized across many documents.

Another important factor is how you plan to search your data. While each shard is searched independently, Elasticsearch eventually needs to merge results from all the searched shards. For instance if you search across 10 indices that have 5 shards each, the node that coordinates the execution of a search request will need to merge 5x10=50 shard results. Here again you need to be careful: if there are too many shard results to merge and/or if you ran an heavy request that produces large shard responses (which can easily happen with aggregations), the task of merging all these shard results can become very resource-intensive, both in terms of CPU and memory. Again this would advocate for having fewer indices.

What is a type?

This is where types help: types are a convenient way to store several types of data in the same index, in order to keep the total number of indices low for the reasons exposed above. In terms of implementation it works by adding a “_type” field to every document that is automatically used for filtering when searching on a specific type. One nice property of types is that searching across several types of the same index comes with no overhead compared to searching a single type: it does not change how many shard results need to be merged.

However this comes with limitations as well:

  • Fields need to be consistent across types. For instance if two fields have the same name in different types of the same index, they need to be of the same field type (string, date, etc.) and have the same configuration.
  • Fields that exist in one type will also consume resources for documents of types where this field does not exist. This is a general issue with Lucene indices: they don’t like sparsity. Sparse postings lists can’t be compressed efficiently because of high deltas between consecutive matches. And the issue is even worse with doc values: for speed reasons, doc values often reserve a fixed amount of disk space for every document, so that values can be addressed efficiently. This means that if Lucene establishes that it needs one byte to store all value of a given numeric field, it will also consume one byte for documents that don’t have a value for this field. Future versions of Elasticsearch will have improvements in this area but I would still advise you to model your data in a way that will limit sparsity as much as possible.
  • Scores use index-wide statistics, so scores of documents in one type can be impacted by documents from other types.

This means types can be helpful, but only if all types from a given index have mappings that are similar. Otherwise, the fact that fields also consume resources in documents where they don’t exist could make things worse than if the data had been stored in separate indices.

Which one should I use?

This is a tough question, and the answer will depend on your hardware, data and use-case. First it is important to realize that types are useful because they can help reduce the number of Lucene indices that Elasticsearch needs to manage. But there is another way that you can reduce this number: creating indices that have fewer shards. For instance, instead of folding 5 types into the same index, you could create 5 indices with 1 primary shard each.

I will try to summarize the questions you should ask yourself to make a decision:

  • Are you using parent/child? If yes this can only be done with two types in the same index.
  • Do your documents have similar mappings? If no, use different indices.
  • If you have many documents for each type, then the overhead of Lucene indices will be easily amortized so you can safely use indices, with fewer shards than the default of 5 if necessary.
  • Otherwise you can consider putting documents in different types of the same index. Or even in the same type.

In conclusion, you may be surprised that there are not as many use cases for types as you expected. And this is right: there are actually few use cases for having several types in the same index for the reasons that we mentioned above. Don’t hesitate to allocate different indices for data that would have different mappings, but still keep in mind that you should keep a reasonable number of shards in your cluster, which can be achieved by reducing the number of shards for indices that don’t require a high write throughput and/or will store low numbers of documents.

elasticSearch indices VS type的更多相关文章

  1. elasticsearch indices.recovery 流程分析(索引的_open操作也会触发recovery)——主分片recovery主要是从translog里恢复之前未写完的index,副分片recovery主要是从主分片copy segment和translog来进行恢复

    摘自:https://www.easyice.cn/archives/231 elasticsearch indices.recovery 流程分析与速度优化 目录 [隐藏] 主分片恢复流程 副本分片 ...

  2. 通过shell脚本统计elasticsearch indices每天的数量以及大小

    前情提要: 最近elasticsearch集群总出问题,之前虽然修复了,现在又出现新的问题,于是PM要求拉取elasticsearch每天建立的索引有多少,索引有多大,需要对机器进行评估 客户现场无法 ...

  3. elasticsearch 深入 —— Search Type检索类型

    在此我们再给出那个查询的代码: $ curl -XGET localhost:9200/startswith/test/_search?pretty -d '{ "query": ...

  4. Elasticsearch 在 7.X版本中去除type的概念

    背景说明 Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎.无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进.性能最好的.功能最全的搜索引擎库. El ...

  5. Elasticsearch 为何要在 7.X版本中 去除type 的概念

    背景说明 Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎.无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进.性能最好的.功能最全的搜索引擎库. El ...

  6. 轻量级OLAP(二):Hive + Elasticsearch

    1. 引言 在做OLAP数据分析时,常常会遇到过滤分析需求,比如:除去只有性别.常驻地标签的用户,计算广告媒体上的覆盖UV.OLAP解决方案Kylin不支持复杂数据类型(array.struct.ma ...

  7. ElasticSearch的基本用法与集群搭建

    一.简介 ElasticSearch和Solr都是基于Lucene的搜索引擎,不过ElasticSearch天生支持分布式,而Solr是4.0版本后的SolrCloud才是分布式版本,Solr的分布式 ...

  8. Elasticsearch索引(company)_Centos下CURL增删改

    目录 返回目录:http://www.cnblogs.com/hanyinglong/p/5464604.html 1.Elasticsearch索引说明 a. 通过上面几篇博客已经将Elastics ...

  9. ElasticSearch学习记录

    中文api 什么是集群? 集群(cluster) >由一个或多个节点组织在一起. >由一个唯一的名字标识,默认为"elasticsearch". 节点(node) &g ...

随机推荐

  1. zookeeper集群实例

    zookeeper是什么 Zookeeper,一种分布式应用的协作服务,是Google的Chubby一个开源的实现,是Hadoop的分布式协调服务,它包含一个简单的原语集,应用于分布式应用的协作服务, ...

  2. call_grant_dml.sql

    set echo offpromptprompt =========================================================================== ...

  3. linux 命令汇总

    一 Grep 命令 各种参数: -i:ignore-case忽略大小写 -c :打印匹配的行数 -l :从多个文件中查找包含匹配项 -v :查找不包含匹配项的行 -n :打印包含匹配项的行和行标 -w ...

  4. 轻松搭建docker应用的mesos集群

    7条命令在docker中部署Mesos集群 所有使用的Docker容器构建文件是有也.您可以在本地构建每个容器或只使用位于Docker Hub预构建的容器.下面的命令会自动下载所需的预建的容器为您服务 ...

  5. [Python]网络爬虫(四):Opener与Handler的介绍和实例应用

    在开始后面的内容之前,先来解释一下urllib2中的两个个方法:info and geturl urlopen返回的应答对象response(或者HTTPError实例)有两个很有用的方法info() ...

  6. Swift --> Map & FlatMap

    转载自:https://segmentfault.com/a/1190000004050907 Map map函数能够被数组调用,它接受一个闭包作为参数,作用于数组中的每个元素.闭包返回一个变换后的元 ...

  7. 判断Android应用是否安装、运行

    本文介绍3个方法,判断手机上是否安装了某应用.该应用是否正在运行.获取手机某个APK文件的安装Intent.启动该Intent就可以直接安装该APK. /** * 判断应用是否已安装 * * @par ...

  8. 灵感闪现 篇 (一) 2d场景 3d 效果

    中途打断一下 ,框架文档的 更新. 另开一篇主题为 灵感闪现的 板块. 在工作生活中,总有发现新事物或新东西 而让自己突然 灵感闪现的时候,那么这个时候,我必须要抓住,并尽快把 这份灵感实现下来. 之 ...

  9. Excel工作表 表名导出

    Technorati 标签: microsoft office,vbs,excel   1: Attribute VB_Name = "表名导出" 2: Sub test() 3: ...

  10. HDU 3265 Posters ——(线段树+扫描线)

    第一次做扫描线,然后使我对线段树的理解发生了动摇= =..这个pushup写的有点神奇.代码如下: #include <stdio.h> #include <algorithm> ...