ES忽略TF-IDF评分——使用constant_score
Ignoring TF/IDF
Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word appears in a field. Perhaps we are searching for a vacation home and we want to find houses that have as many of these features as possible:
- WiFi
- Garden
- Pool
The vacation home documents look something like this:
{ "description": "A delightful four-bedroomed house with ... " }
We could use a simple match query:
GET /_search
{
"query": {
"match": {
"description": "wifi garden pool"
}
}
}
However, this isn’t really full-text search. In this case, TF/IDF just gets in the way. We don’t care whether wifi is a common term, or how often it appears in the document. All we care about is that it does appear. In fact, we just want to rank houses by the number of features they have—the more, the better. If a feature is present, it should score 1, and if it isn’t, 0.
constant_score Query
Enter the constant_score query. This query can wrap either a query or a filter, and assigns a score of1 to any documents that match, regardless of TF/IDF:
GET /_search
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"query": { "match": { "description": "wifi" }}
}},
{ "constant_score": {
"query": { "match": { "description": "garden" }}
}},
{ "constant_score": {
"query": { "match": { "description": "pool" }}
}}
]
}
}
}
Perhaps not all features are equally important—some have more value to the user than others. If the most important feature is the pool, we could boost that clause to make it count for more:
GET /_search
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"query": { "match": { "description": "wifi" }}
}},
{ "constant_score": {
"query": { "match": { "description": "garden" }}
}},
{ "constant_score": {
"boost": 2
"query": { "match": { "description": "pool" }}
}}
]
}
}
}
|
|
A matching |
The final score for each result is not simply the sum of the scores of all matching clauses. The coordination factor and query normalization factor are still taken into account.
We could improve our vacation home documents by adding a not_analyzed features field to our vacation homes:
{ "features": [ "wifi", "pool", "garden" ] } 这样改写有什么好处?省索引空间吗?
参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#ignoring-tfidf
ES忽略TF-IDF评分——使用constant_score的更多相关文章
- Elasticsearch由浅入深(十)搜索引擎:相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
- Elasticsearch学习之相关度评分TF&IDF
relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度 Elasticsearch使用的是 term frequency/inverse doc ...
- 25.TF&IDF算法以及向量空间模型算法
主要知识点: boolean model IF/IDF vector space model 一.boolean model 在es做各种搜索进行打分排序时,会先用boolean mo ...
- TF/IDF计算方法
FROM:http://blog.csdn.net/pennyliang/article/details/1231028 我们已经谈过了如何自动下载网页.如何建立索引.如何衡量网页的质量(Page R ...
- 信息检索中的TF/IDF概念与算法的解释
https://blog.csdn.net/class_brick/article/details/79135909 概念 TF-IDF(term frequency–inverse document ...
- 55.TF/IDF算法
主要知识点: TF/IDF算法介绍 查看es计算_source的过程及各词条的分数 查看一个document是如何被匹配到的 一.算法介绍 relevance score算法,简单来说 ...
- TF/IDF(term frequency/inverse document frequency)
TF/IDF(term frequency/inverse document frequency) 的概念被公认为信息检索中最重要的发明. 一. TF/IDF描述单个term与特定document的相 ...
- 基于TF/IDF的聚类算法原理
一.TF/IDF描述单个term与特定document的相关性TF(Term Frequency): 表示一个term与某个document的相关性. 公式为这个term在document中出 ...
- 使用solr的函数查询,并获取tf*idf值
1. 使用函数df(field,keyword) 和idf(field,keyword). http://118.85.207.11:11100/solr/mobile/select?q={!func ...
随机推荐
- 导出数据生成Excel(MVC)
/// <summary> /// 生成Excel /// </summary> /// <returns></returns> public File ...
- js学习总结--DOM2兼容处理重复问题
在解决this问题之后,只需要在每次往自定义属性和事件池当中添加事件的时候进行以下判断就好了,具体代码如下: /* bind:处理DOM2级事件绑定的兼容性问题(绑定方法) @parameter: c ...
- HDU 3461 Code Lock(并查集的应用+高速幂)
* 65536kb,仅仅能开到1.76*10^7大小的数组. 而题目的N取到了10^7.我開始做的时候没注意,用了按秩合并,uset+rank达到了2*10^7所以MLE,所以貌似不能用按秩合并. 事 ...
- js 中的 prototype 和 constructor
var a=function(){ this.msg="aa"; } a.prototype.say=function(){ alert('this is say');} 1.只有 ...
- 串口设置-stty--设置终端线
stty - chang and print terminal line settings SYNOPSIS stty [-F DEVICE | --file=DEVICE] [SETTING]... ...
- Mysql 索引增加与删除
[1]索引 索引,通俗理解,即目录. 之前说过,计算机是对现实世界的模拟.目录应用在数据库领域,即所谓的索引. 目录的作用显而易见,所以建立索引可以大大提高检索的速度. 但是,会降低更新表的速度,如对 ...
- 在做RTSP摄像机H5无插件直播中遇到的对接海康摄像机发送OPTIONS心跳时遇到的坑
我们在实现一套EasyNVR无插件直播方案时,选择了采用厂家无关化的通用协议RTSP/Onvif接入摄像机IPC/NVR设备,总所周知,Onvif是摄像机的发现与控制管理协议,Onvif用到的流媒体协 ...
- 九度OJ 1145:Candy Sharing Game(分享蜡烛游戏) (模拟)
时间限制:1 秒 内存限制:32 兆 特殊判题:否 提交:248 解决:194 题目描述: A number of students sit in a circle facing their teac ...
- php soap使用示例
soap_client.php <?php try { $client = new SoapClient( null, array('location' =>"http://lo ...
- Python爬虫-- BeautifulSoup库
BeautifulSoup库 beautifulsoup就是一个非常强大的工具,爬虫利器.一个灵活又方便的网页解析库,处理高效,支持多种解析器.利用它就不用编写正则表达式也能方便的实现网页信息的抓取 ...