Ignoring TF/IDF

Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word appears in a field. Perhaps we are searching for a vacation home and we want to find houses that have as many of these features as possible:

  • WiFi
  • Garden
  • Pool

The vacation home documents look something like this:

{ "description": "A delightful four-bedroomed house with ... " }

We could use a simple match query:

GET /_search
{
"query": {
"match": {
"description": "wifi garden pool"
}
}
}

However, this isn’t really full-text search. In this case, TF/IDF just gets in the way. We don’t care whether wifi is a common term, or how often it appears in the document. All we care about is that it does appear. In fact, we just want to rank houses by the number of features they have—the more, the better. If a feature is present, it should score 1, and if it isn’t, 0.

constant_score Query

Enter the constant_score query. This query can wrap either a query or a filter, and assigns a score of1 to any documents that match, regardless of TF/IDF:

GET /_search
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"query": { "match": { "description": "wifi" }}
}},
{ "constant_score": {
"query": { "match": { "description": "garden" }}
}},
{ "constant_score": {
"query": { "match": { "description": "pool" }}
}}
]
}
}
}

Perhaps not all features are equally important—some have more value to the user than others. If the most important feature is the pool, we could boost that clause to make it count for more:

GET /_search
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"query": { "match": { "description": "wifi" }}
}},
{ "constant_score": {
"query": { "match": { "description": "garden" }}
}},
{ "constant_score": {
"boost": 2
"query": { "match": { "description": "pool" }}
}}
]
}
}
}

A matching pool clause would add a score of 2, while the other clauses would add a score of only 1 each.

The final score for each result is not simply the sum of the scores of all matching clauses. The coordination factor and query normalization factor are still taken into account.

We could improve our vacation home documents by adding a not_analyzed features field to our vacation homes:

{ "features": [ "wifi", "pool", "garden" ] } 这样改写有什么好处?省索引空间吗?

参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#ignoring-tfidf

ES忽略TF-IDF评分——使用constant_score的更多相关文章

  1. Elasticsearch由浅入深(十)搜索引擎:相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据

    相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...

  2. Elasticsearch学习之相关度评分TF&IDF

    relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度 Elasticsearch使用的是 term frequency/inverse doc ...

  3. 25.TF&IDF算法以及向量空间模型算法

    主要知识点: boolean model IF/IDF vector space model     一.boolean model     在es做各种搜索进行打分排序时,会先用boolean mo ...

  4. TF/IDF计算方法

    FROM:http://blog.csdn.net/pennyliang/article/details/1231028 我们已经谈过了如何自动下载网页.如何建立索引.如何衡量网页的质量(Page R ...

  5. 信息检索中的TF/IDF概念与算法的解释

    https://blog.csdn.net/class_brick/article/details/79135909 概念 TF-IDF(term frequency–inverse document ...

  6. 55.TF/IDF算法

    主要知识点: TF/IDF算法介绍 查看es计算_source的过程及各词条的分数 查看一个document是如何被匹配到的         一.算法介绍 relevance score算法,简单来说 ...

  7. TF/IDF(term frequency/inverse document frequency)

    TF/IDF(term frequency/inverse document frequency) 的概念被公认为信息检索中最重要的发明. 一. TF/IDF描述单个term与特定document的相 ...

  8. 基于TF/IDF的聚类算法原理

        一.TF/IDF描述单个term与特定document的相关性TF(Term Frequency): 表示一个term与某个document的相关性. 公式为这个term在document中出 ...

  9. 使用solr的函数查询,并获取tf*idf值

    1. 使用函数df(field,keyword) 和idf(field,keyword). http://118.85.207.11:11100/solr/mobile/select?q={!func ...

随机推荐

  1. 【Excle数据透视】如何用含有单元格的数据来创建数据透视

    取消合并单元格,填充相同内容项,然后创建数据透视表. 如下图:需要使用数据创建数据透视表 步骤一 开始→格式刷,然后对单元格区域G2:G15使用格式刷功能,保留合并单元格格式 步骤二 选中A2:A18 ...

  2. HDU1323_Perfection【水题】

    Perfection Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others) Total ...

  3. shell循环,判断介绍,以及实例

    shell的循环主要有3种,for,while,until shell的分支判断主要有2种,if,case 一,for循环 #!/bin/bash for file in $(ls /tmp/test ...

  4. js向后台传递对象

    js: }; $.ajax({ url: "/.../...", type: "POST", async: false, data: JSON.stringif ...

  5. 【Java】事件驱动模型和观察者模式

    你有一件事情,做这件事情的过程包含了许多职责单一的子过程.这样的情况及其常见.当这些子过程有如下特点时,我们应该考虑设计一种合适的框架,让框架来完成一些业务无关的事情,从而使得各个子过程的开发可以专注 ...

  6. WinForm程序打包工具InnoSetup使用说明图文教程

    WinForm程序打包工具InnoSetup使用说明图文教程 WinForm程序开发测试好了,如果将Debug/Release里面的文件发给客户使用,会让客户觉得你不够专业,但是使用VS自带的打包工具 ...

  7. 3_Jsp标签_简单标签_防盗链和转义标签的实现

    一概念 1防盗链 在HTTP协议中,有一个表头字段叫referer,采用URL的格式来表示从哪儿链接到当前的网页或文件,通过referer,网站可以检测目标网页访问的来源网页.有了referer跟踪来 ...

  8. CI学习总结

    1.CI自定义配置文件: 如:config/test.php <?php $config['test']['good'] = array('aa','bb'); 在控制器中这样调用: <? ...

  9. Android 异常解决方法【汇总】

    (1)异常:Android中引入第三方Jar包的方法(Java.lang.NoClassDefFoundError解决办法) 1.在工程下新建lib文件夹,将需要的第三方包拷贝进来.2.将引用的第三方 ...

  10. python 基础 7.4 os 模块

    #/usr/bin/python #coding=utf8 #@Time   :2017/11/11 3:15 #@Auther :liuzhenchuan #@File   :os 模块.py im ...