elasticsearc 参考资料

_source 和store

http://stackoverflow.com/questions/18833899/in-elasticsearch-what-happens-if-i-set-store-to-yes-on-a-few-fields-but-sou

http://stackoverflow.com/questions/17103047/why-do-i-need-storeyes-in-elasticsearch

You usually send a field to elasticsearch because you either want to search on it, or retrieve it. But it's true that if you don't store the field explicitly and you don't disable the source you can still retrieve the field using the _source. This means that in some cases it might actually make sense to have a field that is not indexed nor stored.

When you store a field, that's done in the underlying lucene. Lucene is an inverted index, that allows for fast full-text search and gives back document ids given text queries. Beyond the inverted index Lucene has some kind of storage where the field values can be stored in order to be retrieved given a document id. You usually store in lucene the fields that you want to return as search results. Elasticsearch doesn't require to store every field that you want to return because it always stores by default every document that you send to it, thus it's always able to return everything you sent to it as search result.

In just a few cases it might be useful to store fields explicitly in lucene: when the _source field is disabled, or when we want to avoid parsing it, even if the parsing is done automatically by elasticsearch. Keep in mind though that retrieving many stored fields from lucene might require one disk seek per field while with retrieving only the _source from lucene and parsing it in order to retrieve the needed fields is just a single disk seek and just faster in most of the cases.

如果字段的属性store 被设置为no，也可以通过_source获取文档，然后解析出该字段的内容，但是前提是_source的属性"enabled": true。

Aggregation

http://chrissimpson.co.uk/elasticsearch-aggregations-overview.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

http://stackoverflow.com/questions/21018493/how-to-access-aggregations-result-with-elasticsearch-java-api-in-searchresponse

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order

Top Hit Aggregation

https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-aggregations-metrics-top-hits-aggregation.html

Shards and replicas

一个shard 实际上是一个lucence index

主分片可以接受index，副本不行；但是查询都可以

http://blog.trifork.com/2014/01/07/elasticsearch-how-many-shards/

Aggregation

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html

Aggregation不准确

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate

Mapping

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping-intro.html

每个文档在索引中都有一个类型，每个类型有自己的mapping或者叫模型定义。mapping定义类型中的字段，每个字段的数据类型，以及在弹性搜索中字段是被如何处理的。mapping也用来配置与类型相关的元数据。

弹性搜索支持如下的简单字段数据类型：

String: string
Whole number: byte, short, integer, long
Floating-point: float, double
Boolean: boolean
Date: date

当你索引一个包含新字段的文档时，弹性搜索根据JSON的基本数据类型来猜测文档字段的数据类型。具体的对应关系如下：

JSON type	Field type
Boolean: `true` or `false`	`boolean`
Whole number: `123`	`long`
Floating point: `123.45`	`double`
String, valid date: `2014-09-15`	`date`
String: `foo bar`	`string`

注意：

　　这意味着，如果字段以“123”索引一个数字，该字段会被映射为String类型，而不是long类型。然而，如果该字段已经存在并且被定义为long类型，那么弹性搜索会尝试把String类型转换为long，如果无法转换（例如包含了字母）则会抛出一个异常。

自定义字段映射

字段最重要的属性是type，对于非String类型的字段，除了type属性，你几乎不用指定任何属性。

String类型的字段默认是全文，即：在索引之前，值会传递给分词器；全文检索时，在搜索前值也会先传给分词器。

String类型最重要的两个属性是index和analyzer

Index属性包含三个备选值：

analyzed: 先分词，再索引。
not_analyzed
　　直接索引，所以它是可搜索的，但是用全值建索引，不分词。
no: 不建索引，所以该字段是不可搜索的。

String类型的属性，默认值是analyzed，所以想要用原始值建索引，需要设置为 not_analyzed。

其他类型（例如long，double，date）也有index属性，但是备选值只有no和not_analyzed，这些值永远不会被分词

elasticsearc 参考资料的更多相关文章

Node相关参考资料
参考资料: [玩转Nodejs日志管理log4js]http://blog.fens.me/nodejs-log4js/ [dependencies与devDependencies之间的区别]http ...
CQRS及.NET中的参考资料
(此文章同时发表在本人微信公众号"dotNET每日精华文章",欢迎右边二维码来关注.) 题记:CQRS作为一种设计模式,其实一点都不新鲜了.不过今天有朋友感叹.NET朋友也关注CQ ...
【GoLang】GoLang 微服务、开源库等参考资料
参考资料: GoLang书籍: https://github.com/dariubs/GoBooksGo名库: https://github.com/Unknwon/go-rock-libraries ...
最大化 AIX 上的 Java 性能，第 5 部分: 参考资料和结论
http://www.ibm.com/developerworks/cn/aix/library/es-Javaperf/es-Javaperf5.html 最大化 AIX 上的 Java 性能,第 ...
Cocos2d-x--开发参考资料
1.CocoStudio使用指南所用版本:CocoStudio v3.0.0 Cocos2d-x1.5b 自己网上查找并整理的一些资料,留下做个纪念,也希望对有需要的人有点帮助链接地址:http: ...
Android各层推荐开发书籍及参考资料
Android各层推荐开发书籍及参考资料转自:http://blog.csdn.net/fancylovejava/article/details/8657058 Android系统按照架构来说一共 ...
JVM调优总结（十二）-参考资料
能整理出上面一些东西,也是因为站在巨人的肩上.下面是一些参考资料,供大家学习,大家有更好的,可以继续完善:) · Java 理论与实践: 垃圾收集简史 · Java SE 6 HotSpot[tm] ...
c# WebBrowser开发参考资料
原文:c# WebBrowser开发参考资料 c# WebBrowser开发参考资料,所有资料的采集均来自网上话说有了WebBrowser类,终于不用自己手动封装SHDocVw的AxWebBrows ...
C# 语言规范_版本5.0 (第21章附录C_参考资料)
A. 参考资料 Unicode 联合会.The Unicode Standard, Version 3.0(Unicode 标准 3.0 版).Addison-Wesley,Reading,Massa ...

随机推荐

eclipse(myeclipse) author的默认名字
更改eclipse(myeclipse) author的默认名字 --- 修改MyEclipse eclipse 注释的作者在eclipse/myeclipse中,当我们去添加注释的作者选项时,@a ...
算法（Algorithms）第4版练习 1.3.10
主要思路:和1.3.9相似,只不过运算表达式的生成方式不一样用Dijkstra的双栈算法. 遇到数字则压入数字栈中(String). 遇到运算符则压入运算符栈中(String). 遇到右括号时,从数 ...
html5基本格式
html5基本格式学习要点: HTML5 文档的基本格式 2. 开发工具的基本操作一．文档基本格式 <!DOCTYPE html> 文档声明,告诉计算机这是一个HTML5文档. ...
在ubuntu环境安装youcompleteme
sudo apt-get update #更新软件源 sudo apt-get clang #安装clang sudo apt-get cmake #安装cmake sudo apt-get inst ...
JNDI数据源配置
一.数据源的由来在Java开发中,使用JDBC操作数据库的四个步骤如下: ①加载数据库驱动程序(Class.forName("数据库驱动类");) ②连接数据库(Connec ...
EmbarassedBirds全体开发人员落泪
Github (李昆乘,赖展飞) 现阶段还在开发后期,API调试过程中. 本周无法上线. 全体开发人员留下眼泪. 贴上几个功能图, 给大家尝尝鲜吧! 现阶段仍在API调试因为队员李昆乘经常出去玩没有 ...
python输入空格间隔的一行int
str = input() list = [int(x) for x in str.split()] print(list) 用py刷题肯定得遇到空格间隔的键入,先str接收键入的一行字串,然后把st ...
MySQL11月16-11月21日活动赠送的优惠券使用率_20161124
一.11.16到21号活动规则是单笔订单最高的金额划分客户为399,799,1599元三档达标的分别赠送对应的优惠券优惠券ID有标号区间 THEN "1599档" ELSE ...
「P4996」「洛谷11月月赛」咕咕咕（数论
题目描述小 F 是一个能鸽善鹉的同学,他经常把事情拖到最后一天才去做,导致他的某些日子总是非常匆忙. 比如,时间回溯到了 2018 年 11 月 3 日.小 F 望着自己的任务清单: 看 iG 夺冠 ...
poj1149PIGS——网络最大流
题目:http://poj.org/problem?id=1149 不把猪圈当做点,而把顾客当作点,把猪当作边权(流量): 因为猪圈中的猪可流动,所以共用一个猪圈的人互相连边: 注意应该连成链的形式, ...

elasticsearc 参考资料

elasticsearc 参考资料的更多相关文章

随机推荐

热门专题