elasticsearch查询之keyword字段的查询相关度评分控制
一、数据情况
purchase记录每个用户的购买信息;
PUT purchase
{
"mappings":{
"properties":{
"id":{
"type":"keyword"
},
"name":{
"type":"text"
},
"goods":{
"properties":{
"id":{
"type":"keyword"
},
"name":{
"type":"text"
}
}
}
}
}
}
index 三个document
PUT purchase/_doc/1
{
"id":1,
"name":"sam",
"goods":[
{"id":"g1","name":"ipad"},
{"id":"g2","name":"iphone"}
]
}
PUT purchase/_doc/2
{
"id":2,
"name":"coco",
"goods":[
{"id":"g1","name":"ipad"},
{"id":"g2","name":"iphone"},
{"id":"g3","name":"ipod"}
]
}
PUT purchase/_doc/3
{
"id":3,
"name":"jim",
"goods":[
{"id":"g1","name":"ipad"},
{"id":"g2","name":"iphone"},
{"id":"g3","name":"ipod"},
{"id":"g4","name":"TV"}
]
}
查看索引数据情况
POST purchase/_search
{
"query": {
"match_all": {}
}
}
{
"took":331,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"skipped":0,
"failed":0
},
"hits":{
"total":{
"value":3,
"relation":"eq"
},
"max_score":1,
"hits":[
{
"_index":"purchase",
"_id":"1",
"_score":1,
"_source":{
"id":1,
"name":"sam",
"goods":[
{
"id":"g1",
"name":"ipad"
},
{
"id":"g2",
"name":"iphone"
}
]
}
},
{
"_index":"purchase",
"_id":"2",
"_score":1,
"_source":{
"id":2,
"name":"coco",
"goods":[
{
"id":"g1",
"name":"ipad"
},
{
"id":"g2",
"name":"iphone"
},
{
"id":"g3",
"name":"ipod"
}
]
}
},
{
"_index":"purchase",
"_id":"3",
"_score":1,
"_source":{
"id":3,
"name":"jim",
"goods":[
{
"id":"g1",
"name":"ipad"
},
{
"id":"g2",
"name":"iphone"
},
{
"id":"g3",
"name":"ipod"
},
{
"id":"g4",
"name":"TV"
}
]
}
}
]
}
}
二、查询需求
我们需要查询购买过某种商品的顾客,一般我们可以通过ui的业务逻辑得到需要筛选的一些商品的id,由于id字段是一个不需要分词的keyword字段,所以我们会直接使用term级别的查询;
POST purchase/_search
{
"query": {
"terms": {
"goods.id": [
"g2",
"g3",
"g4"
]
}
}
}
我们可以看到查询结果中的三条记录的权重打分都是1;正常情况下购买商品越多的客户,相对来说价值更大即命中的权重得分越大;
{
"took":0,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"skipped":0,
"failed":0
},
"hits":{
"total":{
"value":3,
"relation":"eq"
},
"max_score":1,
"hits":[
{
"_index":"purchase",
"_id":"1",
"_score":1,
"_source":{
"id":1,
"name":"sam",
"goods":[
{
"id":"g1",
"name":"ipad"
},
{
"id":"g2",
"name":"iphone"
}
]
}
},
{
"_index":"purchase",
"_id":"2",
"_score":1,
"_source":{
"id":2,
"name":"coco",
"goods":[
{
"id":"g1",
"name":"ipad"
},
{
"id":"g2",
"name":"iphone"
},
{
"id":"g3",
"name":"ipod"
}
]
}
},
{
"_index":"purchase",
"_id":"3",
"_score":1,
"_source":{
"id":3,
"name":"jim",
"goods":[
{
"id":"g1",
"name":"ipad"
},
{
"id":"g2",
"name":"iphone"
},
{
"id":"g3",
"name":"ipod"
},
{
"id":"g4",
"name":"TV"
}
]
}
}
]
}
}
三、terms查询分析
我们使用_explain分析一下terms查询怎么打分的;
POST purchase/_explain/3
{
"query": {
"terms": {
"goods.id": [
"g2",
"g3",
"g4"
]
}
}
}
我们可以看到elasticsearch最终使用ConstantScore查询重写的terms查询,此查询默认权重打分为1;
{
"_index" : "purchase",
"_id" : "3",
"matched" : true,
"explanation" : {
"value" : 1.0,
"description" : "ConstantScore(goods.id:g2 goods.id:g3 goods.id:g4)",
"details" : [ ]
}
}
terms提供的查询参数十分有限,其中涉及权重的只有boost,但是这只是针对整个terms查询,而不是内部的子查询;
POST purchase/_explain/3
{
"query": {
"terms": {
"goods.id": [
"g2",
"g3",
"g4"
],
"boost":2
}
}
}
{
"_index" : "purchase",
"_id" : "3",
"matched" : true,
"explanation" : {
"value" : 2.0,
"description" : "ConstantScore(goods.id:g2 goods.id:g3 goods.id:g4)^2.0",
"details" : [ ]
}
}
四、构建子查询打分
match是elasticsearch提供的一个跟terms类似的查询,由于goods.id的type是keyword,所以需要给match指定一个查询时的analyzer,才能保证输入的几个id分开作为不同的查询;
POST purchase/_search
{
"query": {
"match": {
"goods.id": {
"query": "g2 g3 g4",
"analyzer":"standard"
}
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.178501,
"hits" : [
{
"_index" : "purchase",
"_id" : "3",
"_score" : 2.178501,
"_source" : {
"id" : 3,
"name" : "jim",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
},
{
"id" : "g3",
"name" : "ipod"
},
{
"id" : "g4",
"name" : "TV"
}
]
}
},
{
"_index" : "purchase",
"_id" : "2",
"_score" : 0.8298607,
"_source" : {
"id" : 2,
"name" : "coco",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
},
{
"id" : "g3",
"name" : "ipod"
}
]
}
},
{
"_index" : "purchase",
"_id" : "1",
"_score" : 0.18360566,
"_source" : {
"id" : 1,
"name" : "sam",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
}
]
}
}
]
}
}
通过查看文档3的打分情况,我们可以看到elasticsearch先针对每个关键字计算打分,然后将三项打分的和作为最终的打分;在这里我们也可以看到elasticsearch内部会自动将match查询rewrite为三个子查询;
POST purchase/_explain/3
{
"query": {
"match": {
"goods.id": {
"query": "g2 g3 g4",
"analyzer":"standard"
}
}
}
}
{
"_index" : "purchase",
"_id" : "3",
"matched" : true,
"explanation" : {
"value" : 2.178501,
"description" : "sum of:",
"details" : [
{
"value" : 0.18360566,
"description" : "weight(goods.id:g2 in 2) [PerFieldSimilarity], result of:",
"details" : []
},
{
"value" : 0.646255,
"description" : "weight(goods.id:g3 in 2) [PerFieldSimilarity], result of:",
"details" : []
},
{
"value" : 1.3486402,
"description" : "weight(goods.id:g4 in 2) [PerFieldSimilarity], result of:",
"details" : []
}
]
}
}
我们也可以通过bool查询,使用它的should在查询之前手动组建多个子查询;
POST purchase/_search
{
"query": {
"bool": {
"should": [
{"term": {"goods.id": "g2"}},
{"term": {"goods.id": "g3"}},
{"term": {"goods.id": "g4"}}
],
"minimum_should_match": 1
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.178501,
"hits" : [
{
"_index" : "purchase",
"_id" : "3",
"_score" : 2.178501,
"_source" : {
"id" : 3,
"name" : "jim",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
},
{
"id" : "g3",
"name" : "ipod"
},
{
"id" : "g4",
"name" : "TV"
}
]
}
},
{
"_index" : "purchase",
"_id" : "2",
"_score" : 0.8298607,
"_source" : {
"id" : 2,
"name" : "coco",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
},
{
"id" : "g3",
"name" : "ipod"
}
]
}
},
{
"_index" : "purchase",
"_id" : "1",
"_score" : 0.18360566,
"_source" : {
"id" : 1,
"name" : "sam",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
}
]
}
}
]
}
}
在bool查询中,通过查看文档3的打分情况,我们可以看到elasticsearch也是先针对每个关键字计算打分,然后将三项打分的和作为最终的打分;
POST purchase/_explain/3
{
"query": {
"bool": {
"should": [
{"term": {"goods.id": "g2"}},
{"term": {"goods.id": "g3"}},
{"term": {"goods.id": "g4"}}
],
"minimum_should_match": 1
}
}
}
{
"_index" : "purchase",
"_id" : "3",
"matched" : true,
"explanation" : {
"value" : 2.178501,
"description" : "sum of:",
"details" : [
{
"value" : 0.18360566,
"description" : "weight(goods.id:g2 in 2) [PerFieldSimilarity], result of:",
"details" : []
},
{
"value" : 0.646255,
"description" : "weight(goods.id:g3 in 2) [PerFieldSimilarity], result of:",
"details" : []
},
{
"value" : 1.3486402,
"description" : "weight(goods.id:g4 in 2) [PerFieldSimilarity], result of:",
"details" : []
}
]
}
}
五、控制子查询的打分
不管是elasticsearch自动组建子查询,还是我们自己手动构建子查询,elasticsearch都会针对每个查询做相关性的打分计算,这对于一般的语义化关键字搜索是没有问题的;
我们这里的搜索条件goods.id一般是没有任何语义的,不同的值打分应该是一样的;这样我们只能使用bool+constant_score+term来手动构建查询语句;
POST purchase/_search
{
"query": {
"bool": {
"should": [
{"constant_score": {"filter": {"term": {"goods.id": "g2"}}}},
{"constant_score": {"filter": {"term": {"goods.id": "g3"}}}},
{"constant_score": {"filter": {"term": {"goods.id": "g4"}}}}
],
"minimum_should_match": 1
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 3.0,
"hits" : [
{
"_index" : "purchase",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"id" : 3,
"name" : "jim",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
},
{
"id" : "g3",
"name" : "ipod"
},
{
"id" : "g4",
"name" : "TV"
}
]
}
},
{
"_index" : "purchase",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"id" : 2,
"name" : "coco",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
},
{
"id" : "g3",
"name" : "ipod"
}
]
}
},
{
"_index" : "purchase",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "sam",
"goods" : [
{
"id" : "g1",
"name" : "ipad"
},
{
"id" : "g2",
"name" : "iphone"
}
]
}
}
]
}
}
我们看下文档3的打分情况,每一个命中项的打分都是固定的1,最终的打分命中项的和;
POST purchase/_explain/3
{
"query": {
"bool": {
"should": [
{"constant_score": {"filter": {"term": {"goods.id": "g2"}}}},
{"constant_score": {"filter": {"term": {"goods.id": "g3"}}}},
{"constant_score": {"filter": {"term": {"goods.id": "g4"}}}}
],
"minimum_should_match": 1
}
}
}
{
"_index" : "purchase",
"_id" : "3",
"matched" : true,
"explanation" : {
"value" : 3.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "ConstantScore(goods.id:g2)",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "ConstantScore(goods.id:g3)",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "ConstantScore(goods.id:g4)",
"details" : [ ]
}
]
}
}
elasticsearch查询之keyword字段的查询相关度评分控制的更多相关文章
- Django---Django的ORM的一对多操作(外键操作),ORM的多对多操作(关系管理对象),ORM的分组聚合,ORM的F字段查询和Q字段条件查询,Django的事务操作,额外(Django的终端打印SQL语句,脚本调试)
Django---Django的ORM的一对多操作(外键操作),ORM的多对多操作(关系管理对象),ORM的分组聚合,ORM的F字段查询和Q字段条件查询,Django的事务操作,额外(Django的终 ...
- Elasticsearch由浅入深(十)搜索引擎:相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
- 创建ASP.NET Core MVC应用程序(5)-添加查询功能 & 新字段
创建ASP.NET Core MVC应用程序(5)-添加查询功能 & 新字段 添加查询功能 本文将实现通过Name查询用户信息. 首先更新GetAll方法以启用查询: public async ...
- Django ORM queryset object 解释(子查询和join连表查询的结果)
#下面两种是基于QuerySet查询 也就是说SQL中用的jion连表的方式查询books = models.UserInfo.objects.all() print(type(books)) --- ...
- ElasticSearch 学习记录之ES查询添加排序字段和使用missing或existing字段查询
ES添加排序 在默认的情况下,ES 是根据文档的得分score来进行文档额排序的.但是自己可以根据自己的针对一些字段进行排序.就像下面的查询脚本一样.下面的这个查询是根据productid这个值进行排 ...
- Elasticsearch 结构化搜索、keyword、Term查询
前言 Elasticsearch 中的结构化搜索,即面向数值.日期.时间.布尔等类型数据的搜索,这些数据类型格式精确,通常使用基于词项的term精确匹配或者prefix前缀匹配.本文还将新版本的&qu ...
- [Elasticsearch] 多字段搜索 (三) - multi_match查询和多数字段 <译>
multi_match查询 multi_match查询提供了一个简便的方法用来对多个字段执行相同的查询. NOTE 存在几种类型的multi_match查询,其中的3种正好和在“了解你的数据”一节中提 ...
- ElasticSearch 6.2 Mapping参数说明及text类型字段聚合查询配置
背景: 由于本人使用的是6.0以上的版本es,在使用发现很多中文博客对于mapping参数的说明已过时.ES6.0以后有很多参数变化. 现我根据官网总结mapping最新的参数,希望能对大家有用处. ...
- [Elasticsearch] 多字段搜索 (三) - multi_match查询和多数字段
multi_match查询 multi_match查询提供了一个简便的方法用来对多个字段执行相同的查询. NOTE 存在几种类型的multi_match查询,其中的3种正好和在"了解你的数据 ...
随机推荐
- vue2 使用 swiper 轮播图效果
第一步.先安装swiper插件 npm install swiper@3.4.1 --save-dev 第二步.组件内引入swiper插件 import Swiper from 'swiper' im ...
- mac安装git、node
1.需要先安装homebrew(之前的文章里有) 2.安装git brew install git 3.安装node brew install node 3.1.安装成功后,查看版本号 node -v ...
- 记一次前端CryptoJS AES解密
1.背景 业务需求,需要联动多个平台,涉及到各平台的模拟登录. 已知加密前明文且正常登录.(无验证码要求) 某平台验证验证方式为.\login接口POST一串json字符串 { "accou ...
- 小样本利器1.半监督一致性正则 Temporal Ensemble & Mean Teacher代码实现
这个系列我们用现实中经常碰到的小样本问题来串联半监督,文本对抗,文本增强等模型优化方案.小样本的核心在于如何在有限的标注样本上,最大化模型的泛化能力,让模型对unseen的样本拥有很好的预测效果.之前 ...
- Jmeter基础入门应用举例
举例当然应该有接口下面以常用的百度搜索接口为例: 1.接口地址: http://www.baidu.com/s?ie=utf-8&wd=jmeter性能测试 2.请求参数 ie:编码方式,默认 ...
- 【Java面试】请说一下ReentrantLock的实现原理?
一个工作了3年的粉丝私信我,在面试的时候遇到了这样一个问题. "请说一下ReentrantLock的实现原理",他当时根据自己的理解零零散散的说了一些. 但是似乎没有说到关键点上, ...
- Puppeteer学习笔记 (2)- Puppeteer的安装
本文链接:https://www.cnblogs.com/hchengmx/p/11009849.html 1. node的下载安装 由于puppeteer是nodejs的一个库,所以首先需要安装no ...
- DAST 黑盒漏洞扫描器 第四篇:扫描性能
0X01 前言 大多数安全产品的大致框架 提高性能的目的是消费跟得上生产,不至于堆积,留有余力应对突增的流量,可以从以下几个方面考虑 流量:减少无效流量 规则:减少规则冗余请求 生产者:减少无效扫描任 ...
- Idea创建文件夹自动合成一个
在idea中创建文件夹时,它们总是自动合成一个,如下图: 文件夹自动折叠真的很影响效率,可能会引发一些不经意的失误 解决方法: 取消这个地方的勾选 这样就可以正常创建文件夹了
- JavaScript中动态生成表格
动态生成表格,首先需要输入并获取动态的数字,html中结构代码如下:行:<input type="text" id="row" value="5 ...