ES忽略TF-IDF评分——使用constant

Ignoring TF/IDF

Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word appears in a field. Perhaps we are searching for a vacation home and we want to find houses that have as many of these features as possible:

WiFi
Garden
Pool

The vacation home documents look something like this:

{ "description": "A delightful four-bedroomed house with ... " }

We could use a simple match query:

GET /_search

{

  "query": {

    "match": {

      "description": "wifi garden pool"

    }

  }

}

However, this isn’t really full-text search. In this case, TF/IDF just gets in the way. We don’t care whether wifi is a common term, or how often it appears in the document. All we care about is that it does appear. In fact, we just want to rank houses by the number of features they have—the more, the better. If a feature is present, it should score 1, and if it isn’t, 0.

constant_score Query

Enter the constant_score query. This query can wrap either a query or a filter, and assigns a score of1 to any documents that match, regardless of TF/IDF:

GET /_search

{

  "query": {

    "bool": {

      "should": [

        { "constant_score": {

          "query": { "match": { "description": "wifi" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "garden" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "pool" }}

        }}

      ]

    }

  }

}

Perhaps not all features are equally important—some have more value to the user than others. If the most important feature is the pool, we could boost that clause to make it count for more:

GET /_search

{

  "query": {

    "bool": {

      "should": [

        { "constant_score": {

          "query": { "match": { "description": "wifi" }}

        }},

        { "constant_score": {

          "query": { "match": { "description": "garden" }}

        }},

        { "constant_score": {

          "boost":   2 

          "query": { "match": { "description": "pool" }}

        }}

      ]

    }

  }

}

A matching pool clause would add a score of 2, while the other clauses would add a score of only 1 each.

The final score for each result is not simply the sum of the scores of all matching clauses. The coordination factor and query normalization factor are still taken into account.

We could improve our vacation home documents by adding a not_analyzed features field to our vacation homes:

{ "features": [ "wifi", "pool", "garden" ] } 这样改写有什么好处？省索引空间吗？

参考：https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#ignoring-tfidf

ES忽略TF-IDF评分——使用constant_score的更多相关文章

Elasticsearch由浅入深（十）搜索引擎：相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
Elasticsearch学习之相关度评分TF&IDF
relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度 Elasticsearch使用的是 term frequency/inverse doc ...
25.TF&IDF算法以及向量空间模型算法
主要知识点: boolean model IF/IDF vector space model 一.boolean model 在es做各种搜索进行打分排序时,会先用boolean mo ...
TF/IDF计算方法
FROM:http://blog.csdn.net/pennyliang/article/details/1231028 我们已经谈过了如何自动下载网页.如何建立索引.如何衡量网页的质量(Page R ...
信息检索中的TF/IDF概念与算法的解释
https://blog.csdn.net/class_brick/article/details/79135909 概念 TF-IDF(term frequency–inverse document ...
55.TF/IDF算法
主要知识点: TF/IDF算法介绍查看es计算_source的过程及各词条的分数查看一个document是如何被匹配到的一.算法介绍 relevance score算法,简单来说 ...
TF/IDF（term frequency/inverse document frequency)
TF/IDF(term frequency/inverse document frequency) 的概念被公认为信息检索中最重要的发明. 一. TF/IDF描述单个term与特定document的相 ...
基于TF/IDF的聚类算法原理
一.TF/IDF描述单个term与特定document的相关性TF(Term Frequency): 表示一个term与某个document的相关性. 公式为这个term在document中出 ...
使用solr的函数查询,并获取tf*idf值
1. 使用函数df(field,keyword) 和idf(field,keyword). http://118.85.207.11:11100/solr/mobile/select?q={!func ...

随机推荐

vue v-for与v-if组合使用
当它们处于同一节点,v-for 的优先级比 v-if 更高,这意味着 v-if 将分别重复运行于每个 v-for 循环中.当你想为仅有的_一些_项渲染节点时,这种优先级的机制会十分有用,如下: < ...
网络电台（WIZ550io）
网络电台是用WIZ550io(内嵌MAC地址)和ATMEGA1284(Flash 128K,EEPROM4K)制作的.用户可注冊多达80个无线电广播. 无线电广播的注冊可在内嵌网页中进行. 网络电台的 ...
ExtJs4学习（一）：正确认识ExtJs4
认识ExtJs 1.Javat能用ExtJs吗? 它是展现层的技术,与JS,HTML,CSS有关.至于server端是.Net,还是PHP等无关. 2.ExtJs适合什么样的项目? 依照官方的说法,E ...
时钟展频技术能有效降低EMI，深入讲解展频发生器！
原文地址:https://baijiahao.baidu.com/s?id=1608649367453023659&wfr=spider&for=pc 相关文章: 1.http://b ...
XMLHttpRequest cannot load ''. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin ' ' is therefore not allowed access.
ajax跨域禁止访问! 利用Access-Control-Allow-Origin响应头解决跨域请求
LeetCode222——Count Complete Tree Nodes
Given a complete binary tree, count the number of nodes. Definition of a complete binary tree from W ...
eclipse maven安装配置
下载在Apache下载Maven,下载地址:http://maven.apache.org/download.html,在win7下载文件为:apache-maven-3.1.0-bin.zip. ...
Jquery获取iframe中的元素
iframe与父页面之间相互获取元素的方法: 1.从父页面中获取iframe页面中的元素: 用法: $(window.frames["iframe_include_adverse" ...
maven scope runtime
https://blog.csdn.net/ningbohezhijunbl/article/details/25818069 There are 6 scopes available: compil ...
数据结构---python---表
一.list的基本实现技术在数据结构中,如果用python实现线性表,无疑要提到list,list是一种元素个数可变的线性表(而tuple是不变的表,不支持改变其内部状态的任何操作,其他与list性 ...

ES忽略TF-IDF评分——使用constant_score

Ignoring TF/IDF

constant_score Query

ES忽略TF-IDF评分——使用constant_score的更多相关文章

随机推荐

热门专题