es原理

一：一个请求到达es集群，选中一个coordinate节点以后，会通过请求路由到指定primary shard中，如果分发策略选择为round-robin，如果来4个请求，则2个打到primary shard中2个打到replic shard中。

二： es在多个shard进行分片但数据倾斜严重的时候有可能会发生搜索score不准的情况，因为IDF分值的计算方法实在shard本地完成的；如shard1中数据较多，在计算某一词搜索时的分值时会导致分值整体下降，而这时shard2中出现的词频较少会整体分值偏高，这样容易导致原本不太相关的内容却变得分值高了起来，从而使排序不准；解决方法就是让多个shard在生产环境中尽量做到数据均衡分布，这样就不会因为score的本地计算而整体受影响。

三： es计算分值时有两种策略：

1）most-field->默认策略是全文检索的所有关键词，在document的每一个field中可匹配的次数越多则分值越高；规则：（每个match中field匹配分值的和） *（实际document匹配到了字段个数）/（query中match的个数），如下代码：

GET /index3/type3/_search

{

  "query": {

    "bool": {

      "should": [

        {

          "match": {

            "title":"spark"//title中可匹配成功

          }

        },

        {

          "match": {

            "content":"java"//content中也可匹配成功

          }

        }

      ]

    }

  }

}

2）beast-field->如果使用dis_max，document的分值则会根据match中field匹配分值最高的决定，也就是说和其他属性无关

GET /index3/type3/_search

{

  "query": {

    "dis_max": {

      "queries": [

        {

          "match": {

            "title": "spark"

          }

        },

        {

          "match": {

            "content": "java"

          }

        }

      ]

    }

  }

3）es中除了most_fields和beast_fields以外，使用cross_fields的情况还是比较多的，使用es系统中默认的cross_fields策略实质是将 "fields": ["name","content"]两个字段的内容放到一起后建立索引，这样就能通过一个fullField字段进行fullText，使结果更加准确

搜索参数：

GET /index2/type2/_search

{

  "query": {

    "multi_match": {

      "query": "happening like",

      //query中的搜索词条去content和name两个字段中来匹配，不过会由于两个字段mapping定义不同导致得分不同，排序结果可能有差异

      "fields": ["name","content"],

      //best_fields策略是每个document的得分等于得分最高的match field的值；而匹配出最佳以后，其它document得分未必准确；most_fields根据每个field的评分计算出ducoment的综合评分

      "type":"cross_fields",

      "operator":"and"

    }

  }

}

结果：

{

  "took": 36,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": 0.84968257,

    "hits": [

      {

        "_index": "index2",

        "_type": "type2",

        "_id": "2",

        "_score": 0.84968257,

        "_source": {

          "num": 10,

          "title": "他的名字",

          "name": "yes happening like write",

          "content": "happening like"

        }

      },

      {

        "_index": "index2",

        "_type": "type2",

        "_id": "4",

        "_score": 0.8164005,

        "_source": {

          "num": 1000,

          "title": "我的名字",

          "name": "happening like write",

          "content": "happening hello like yeas and he happening like had read a lot about happening hello like"

        }

      },

      {

        "_index": "index2",

        "_type": "type2",

        "_id": "3",

        "_score": 0.5063205,

        "_source": {

          "num": 105,

          "title": "这是谁的名字",

          "name": "happening like write",

          "content": " national  treasure because  of its rare number and cute appearance. Many foreign people are so crazy about  pandas and they can’t watching these  lovely creatures all the time. Though some action"

        }

      }

    ]

  }

}

四：提升全文检索效果的两种方法

1) 使用boost提升检索分值

GET index3/type3/_search

{

  "query": {

    "bool": {

      "should": [

        {

          "match": {

            "content": {

              "query": "from",

              "boost":5//使用boost将term检索评分提升5倍

            }

          }

        },{

          "match": {

            "content": {

              "query": "foot"//如果不使用boost则搜索foot则会得分较高

            }

          }

        }

      ]

    }

  }

}

结果:

{

  "took": 3,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": 1.3150566,

    "hits": [

      {

        "_index": "index3",

        "_type": "type3",

        "_id": "1",

        "_score": 1.3150566,

        "_source": {

          "date": "2019-01-02",

          "name": "the little",

          "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.",

          "no": "123"

        }

      },

      {

        "_index": "index3",

        "_type": "type3",

        "_id": "5",

        "_score": 1.3114156,

        "_source": {

          "date": "2019-05-01",

          "name": "http litty",

          "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams",

          "no": "564",

          "description": "描述"

        }

      },

      {

        "_index": "index3",

        "_type": "type3",

        "_id": "3",

        "_score": 0.28582606,

        "_source": {

          "date": "2019-07-01",

          "name": "very tag",

          "content": "Some of our hello  comrades love book to write long articles with no substance, very much like the foot bindings of a slattern, long as well as smelly",

          "no": "123"

        }

      }

    ]

  }

}

2）使用boosting的positive和negative进行反向筛选，通过设置（negative_boost：0.5）降低分值

GET index3/type3/_search

{

  "query": {

    "boosting": {

      //正常匹配的

      "positive": {

        "match": {

          "content": "from"

        }

      },

      //降低分值去匹配的,以下字段的分值乘以negative_boost值

      "negative": {

        "match": {

            "content": {

              "query": "Half"

            }

          }

      },

      "negative_boost": 0.1

    }

  }

}

结果：

{

  "took": 2,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 2,

    "max_score": 0.26228312,

    "hits": [

      {

        "_index": "index3",

        "_type": "type3",

        "_id": "5",

        "_score": 0.26228312,

        "_source": {

          "date": "2019-05-01",

          "name": "http litty",

          "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams",

          "no": "564",

          "description": "描述"

        }

      },

      {

        "_index": "index3",

        "_type": "type3",

        "_id": "1",

        "_score": 0.026301134,

        "_source": {

          "date": "2019-01-02",

          "name": "the little",

          "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.",

          "no": "123"

        }

      }

    ]

  }

}

es原理的更多相关文章

ES原理（转载）
该博客属于转载,是很经典的一篇关于ES的介绍: Elasticsearch 是一个兼有搜索引擎和NoSQL数据库功能的开源系统,基于Java/Lucene构建,可以用于全文搜索,结构化搜索以及近实时分 ...
【漫画】ES原理必知必会的倒排索引和分词
倒排索引的初衷倒排索引,它也是索引.索引,初衷都是为了快速检索到你要的数据. 我相信你一定知道mysql的索引,如果对某一个字段加了索引,一般来说查询该字段速度是可以有显著的提升. 每种数据库都有自 ...
【Elasticsearch 技术分享】—— 十张图带大家看懂 ES 原理！明白为什么说：ES 是准实时的！
前言说到 Elasticsearch ,其中最明显的一个特点就是 near real-time 准实时 -- 当文档存储在Elasticsearch中时,将在1秒内以几乎实时的方式对其进行索引和完全 ...
全文检索原理以及es
最近要做个文章搜索,对全文检索原理以及es原理进行了一些调研, 1. es索引文件为多个文本文件描述,索引文件中的内容构成可见 http://elasticsearch.cn/article/86 ...
es集群数据库~原理细节
ES原理一基本定义 index(索引) 相当于mysql中的数据库 type(类型) 相当于mysql中的一张表 document(文档) 相当于mysql中的一行(一条记录) fie ...
Elasticsearch（二）--集群原理及优化
一.ES原理 1.索引结构ES是面向文档的各种文本内容以文档的形式存储到ES中,文档可以是一封邮件.一条日志,或者一个网页的内容.一般使用 JSON 作为文档的序列化格式,文档可以有很多字段,在创建 ...
《ElasticSearch6.x实战教程》之父-子关系文档
第七章-父-子关系文档打虎亲兄弟,上阵父子兵. 本章作为复杂搜索的铺垫,介绍父子文档是为了更好的介绍复杂场景下的ES操作. 在非关系型数据库数据库中,我们常常会有表与表的关联查询.例如学生表和成绩表 ...
python操作elasticsearch增、删、改、查
最近接触了个新东西--es数据库这东西虽然被用的很多,但我是前些天刚刚接触的,发现其资料不多,学起来极其痛苦,写个文章记录下导入库from elasticsearch import Elastic ...
ELK集群之elasticsearch（3）
Elasticsearch-基础介绍及索引原理分析介绍 Elasticsearch 是一个分布式可扩展的实时搜索和分析引擎,一个建立在全文搜索引擎 Apache Lucene(TM) 基础上的搜索引 ...

随机推荐

Java与设计模式之单例模式（下）安全的单例模式
关于单例设计模式,<Java与设计模式之单例模式(上)六种实现方式>介绍了6种不同的单例模式,线程安全,本文介绍该如何保证单例模式最核心的作用——“实现该模式的类有且只有一个实 ...
FCS省选模拟赛 Day7
Description Solution T1 island 考虑把问题成两部分计算纵坐标的距离和很好计算,在输入的同时一次计算了就完事横坐标又分成两部分分别在\(y\)轴不同侧的矩形的距离和 ...
域渗透复盘(安洵CTF线下)
复盘线下域渗透环境Write Up 0x01 外网web到DMZ进域外网web入口 joomla应用 192.168.0.5 反序列化打下来 GET /index.php HTTP/1.1 Ho ...
【CSP模拟赛】益智游戏（最短路（DJSPFA）&拓扑排序）
题目描述小P和小R在玩一款益智游戏.游戏在一个正权有向图上进行. 小P 控制的角色要从A 点走最短路到B 点,小R 控制的角色要从C 点走最短路到D 点. 一个玩家每回合可以有两种选择,移动到一个相 ...
[树链剖分]BZOJ3589动态树
题目描述别忘了这是一棵动态树, 每时每刻都是动态的. 小明要求你在这棵树上维护两种事件事件0: 这棵树长出了一些果子, 即某个子树中的每个节点都会长出K个果子. 事件1: 小明希望你求出几条树枝上 ...
第10组 Alpha冲刺（1/6）
链接部分队名:女生都队组长博客: 博客链接作业博客:博客链接小组内容史恩泽(组长) 过去两天完成了哪些任务描述了解了反馈机制的实现原理确定好算法的框架对接口的规范化进行学习展示Gi ...
Python: 在CSV文件中写入中文字符
0.2 2016.09.26 11:28* 字数 216 阅读 8053评论 2喜欢 5 最近一段时间的学习中发现,Python基本和中文字符杠上了.如果能把各种编码问题解决了,基本上也算对Pytho ...
RabbitMQ之Topic交换器模式开发
Topic交换器,即主题模式,进行规则匹配. 一.Provider 配置文件 spring.application.name=provider spring.rabbitmq.host=192.168 ...
VUE导入Excel
import FilenameOption from './components/FilenameOption' import AutoWidthOption from './components/A ...
Diffie-Hellman算法简介
一.DH算法是一种密钥交换协议,它可以让双方在不泄漏密钥的情况下协商出一个密钥来. DH算法基于数学原理,比如小明和小红想要协商一个密钥,可以这么做: . 小明先选一个素数和一个底数,例如,素数p=, ...

es原理

es原理的更多相关文章

随机推荐

热门专题