ES忽略TF-IDF评分——使用constant

Ignoring TF/IDF

Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word appears in a field. Perhaps we are searching for a vacation home and we want to find houses that have as many of these features as possible:

WiFi
Garden
Pool

The vacation home documents look something like this:

{ "description": "A delightful four-bedroomed house with ... " }

We could use a simple match query:

GET /_search

{

  "query": {

    "match": {

      "description": "wifi garden pool"

    }

  }

}

However, this isn’t really full-text search. In this case, TF/IDF just gets in the way. We don’t care whether wifi is a common term, or how often it appears in the document. All we care about is that it does appear. In fact, we just want to rank houses by the number of features they have—the more, the better. If a feature is present, it should score 1, and if it isn’t, 0.

constant_score Query

Enter the constant_score query. This query can wrap either a query or a filter, and assigns a score of1 to any documents that match, regardless of TF/IDF:

GET /_search

{

  "query": {

    "bool": {

      "should": [

        { "constant_score": {

          "query": { "match": { "description": "wifi" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "garden" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "pool" }}

        }}

      ]

    }

  }

}

Perhaps not all features are equally important—some have more value to the user than others. If the most important feature is the pool, we could boost that clause to make it count for more:

GET /_search

{

  "query": {

    "bool": {

      "should": [

        { "constant_score": {

          "query": { "match": { "description": "wifi" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "garden" }}

        }},

        { "constant_score": {

          "boost":   2 

          "query": { "match": { "description": "pool" }}

        }}

      ]

    }

  }

}

A matching pool clause would add a score of 2, while the other clauses would add a score of only 1 each.

The final score for each result is not simply the sum of the scores of all matching clauses. The coordination factor and query normalization factor are still taken into account.

We could improve our vacation home documents by adding a not_analyzed features field to our vacation homes:

{ "features": [ "wifi", "pool", "garden" ] } 这样改写有什么好处？省索引空间吗？

参考：https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#ignoring-tfidf

ES忽略TF-IDF评分——使用constant_score的更多相关文章

Elasticsearch由浅入深（十）搜索引擎：相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
Elasticsearch学习之相关度评分TF&IDF
relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度 Elasticsearch使用的是 term frequency/inverse doc ...
25.TF&IDF算法以及向量空间模型算法
主要知识点: boolean model IF/IDF vector space model 一.boolean model 在es做各种搜索进行打分排序时,会先用boolean mo ...
TF/IDF计算方法
FROM:http://blog.csdn.net/pennyliang/article/details/1231028 我们已经谈过了如何自动下载网页.如何建立索引.如何衡量网页的质量(Page R ...
信息检索中的TF/IDF概念与算法的解释
https://blog.csdn.net/class_brick/article/details/79135909 概念 TF-IDF(term frequency–inverse document ...
55.TF/IDF算法
主要知识点: TF/IDF算法介绍查看es计算_source的过程及各词条的分数查看一个document是如何被匹配到的一.算法介绍 relevance score算法,简单来说 ...
TF/IDF（term frequency/inverse document frequency)
TF/IDF(term frequency/inverse document frequency) 的概念被公认为信息检索中最重要的发明. 一. TF/IDF描述单个term与特定document的相 ...
基于TF/IDF的聚类算法原理
一.TF/IDF描述单个term与特定document的相关性TF(Term Frequency): 表示一个term与某个document的相关性. 公式为这个term在document中出 ...
使用solr的函数查询,并获取tf*idf值
1. 使用函数df(field,keyword) 和idf(field,keyword). http://118.85.207.11:11100/solr/mobile/select?q={!func ...

随机推荐

fiddler不能监听 localhost和 127.0.0.1的问题 .
localhost/127.0.0.1的请求不会通过任何代理发送,fiddler也就无法截获. 解决方案用 http://localhost. (locahost紧跟一个点号) 用 http://1 ...
多媒体层预览（Media Layer OverView）
音频模块位于多媒体层里.多媒体层包含了图形.音频.视频三种技术.这三种技术会给你带来声觉.视觉上的良好体验. 来看看ios的结构体系以及媒体层上的内容: ...
七款Debug工具推荐：iOS开发必备的调试利器
历时数周或数月开发出来了应用或游戏.可为什么体验不流畅?怎么能查出当中的纰漏?这些须要调试诊断工具从旁协助.调试是开发过程中不可缺少的重要一环.本文会列举几个比較有效的调试诊断工具,能够帮助你寻根究底 ...
【转载】关于 .Net 逆向的那些工具：反编译篇
在项目开发过程中,估计也有人和我遇到过同样的经历:生产环境出现了重大Bug亟需解决,而偏偏就在这时仓库中的代码却不是最新的.在这种情况下,我们不能直接在当前的代码中修改这个Bug然后发布,这会导致更严 ...
hdu 2814 Interesting Fibonacci
pid=2814">点击此处就可以传送 hdu 2814 题目大意:就是给你两个函数,一个是F(n) = F(n-1) + F(n-2), F(0) = 0, F(1) = 1; 还有 ...
sudo apt-get update 没有公钥，无法验证下列签名
在更新系统源后,输入sudo apt-get update之后出现提示: W: GPG 错误:http://archive.ubuntukylin.com:10006 xenial InRelease ...
StringUtils方法
org.apache.commons.lang.StringUtils中方法的操作对象是java.lang.String类型的对象,是JDK提供的String类型操作方法的补充,并且是null安全的( ...
Spring学习四----------Bean的配置之Bean的配置项及作用域
© 版权声明:本文为博主原创文章,转载请注明出处 Bean的作用域(每个作用域都是在同一个Bean容器中) 1.singleton:单例,指一个Bean容器中只存在一份(默认) 2.prototype ...
FileOutPutStream 的写操作
package xinhuiji_day07; import java.io.File;import java.io.FileNotFoundException;import java.io.File ...
【Python基础】之异常
一.常见异常 try: open('abc.txt','r') except FileNotFoundError: print('异常啦!') 输出结果: ======= 异常啦! 我们通过 open ...

ES忽略TF-IDF评分——使用constant_score

Ignoring TF/IDF

constant_score Query

ES忽略TF-IDF评分——使用constant_score的更多相关文章

随机推荐

热门专题