ES忽略TF-IDF评分——使用constant_score
Ignoring TF/IDF
Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word appears in a field. Perhaps we are searching for a vacation home and we want to find houses that have as many of these features as possible:
- WiFi
- Garden
- Pool
The vacation home documents look something like this:
{ "description": "A delightful four-bedroomed house with ... " }
We could use a simple match query:
GET /_search
{
"query": {
"match": {
"description": "wifi garden pool"
}
}
}
However, this isn’t really full-text search. In this case, TF/IDF just gets in the way. We don’t care whether wifi is a common term, or how often it appears in the document. All we care about is that it does appear. In fact, we just want to rank houses by the number of features they have—the more, the better. If a feature is present, it should score 1, and if it isn’t, 0.
constant_score Query
Enter the constant_score query. This query can wrap either a query or a filter, and assigns a score of1 to any documents that match, regardless of TF/IDF:
GET /_search
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"query": { "match": { "description": "wifi" }}
}},
{ "constant_score": {
"query": { "match": { "description": "garden" }}
}},
{ "constant_score": {
"query": { "match": { "description": "pool" }}
}}
]
}
}
}
Perhaps not all features are equally important—some have more value to the user than others. If the most important feature is the pool, we could boost that clause to make it count for more:
GET /_search
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"query": { "match": { "description": "wifi" }}
}},
{ "constant_score": {
"query": { "match": { "description": "garden" }}
}},
{ "constant_score": {
"boost": 2
"query": { "match": { "description": "pool" }}
}}
]
}
}
}
|
|
A matching |
The final score for each result is not simply the sum of the scores of all matching clauses. The coordination factor and query normalization factor are still taken into account.
We could improve our vacation home documents by adding a not_analyzed features field to our vacation homes:
{ "features": [ "wifi", "pool", "garden" ] } 这样改写有什么好处?省索引空间吗?
参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html#ignoring-tfidf
ES忽略TF-IDF评分——使用constant_score的更多相关文章
- Elasticsearch由浅入深(十)搜索引擎:相关度评分 TF&IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
- Elasticsearch学习之相关度评分TF&IDF
relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度 Elasticsearch使用的是 term frequency/inverse doc ...
- 25.TF&IDF算法以及向量空间模型算法
主要知识点: boolean model IF/IDF vector space model 一.boolean model 在es做各种搜索进行打分排序时,会先用boolean mo ...
- TF/IDF计算方法
FROM:http://blog.csdn.net/pennyliang/article/details/1231028 我们已经谈过了如何自动下载网页.如何建立索引.如何衡量网页的质量(Page R ...
- 信息检索中的TF/IDF概念与算法的解释
https://blog.csdn.net/class_brick/article/details/79135909 概念 TF-IDF(term frequency–inverse document ...
- 55.TF/IDF算法
主要知识点: TF/IDF算法介绍 查看es计算_source的过程及各词条的分数 查看一个document是如何被匹配到的 一.算法介绍 relevance score算法,简单来说 ...
- TF/IDF(term frequency/inverse document frequency)
TF/IDF(term frequency/inverse document frequency) 的概念被公认为信息检索中最重要的发明. 一. TF/IDF描述单个term与特定document的相 ...
- 基于TF/IDF的聚类算法原理
一.TF/IDF描述单个term与特定document的相关性TF(Term Frequency): 表示一个term与某个document的相关性. 公式为这个term在document中出 ...
- 使用solr的函数查询,并获取tf*idf值
1. 使用函数df(field,keyword) 和idf(field,keyword). http://118.85.207.11:11100/solr/mobile/select?q={!func ...
随机推荐
- project修改时间日历
视图→甘特图 格式→时间表→右键时间表 详细的日程表,然后双击时间即可
- 使用rabbitmq rpc 模式
服务器端 安装 ubuntu 16.04 server 安装 rabbitmq-server 设置 apt 源 curl -s https://packagecloud ...
- 日文符号“・”插入sql-server2005乱码问题
错误:日文符号"・"插入sql-server2005符号.出现乱码 原因:DB字段设为varchar.DB文字编码为"Chinese_PRC_CI_AS" 相应 ...
- 自定义序列化技术3 (.net 序列化技术) C++ 调用C# DLL
打开SerializableAttribute利用里面的函数进行编辑. // sparse.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" ...
- 重读金典------高质量C编程指南(林锐)-------第六章 函数设计
函数设计最重要的无外乎两个方面,一个是函数的接口设计一个是内部实现的一些规则. 在C语言中,函数的参数和返回值的传递方式分为两种: 值传递与指针传递.而C++中,多了一个引用传递. 引用传递有些像指针 ...
- Mjpg_Streamer 的移植
1. 移植mjpg-streamer a.1 移植libjpeg tar zxf libjpeg-turbo-1.2.1.tar.gz cd libjpeg-turbo-1.2.1 ./configu ...
- NodeJS待重头收拾旧山河(重拾)
介绍 Node.js®是一个基于Chrome V8 JavaScript引擎构建的JavaScript运行时. Node.js使用事件驱动的非阻塞I / O模型,使其轻便且高效. Node.js的包生 ...
- Spring学习三----------注入方式
© 版权声明:本文为博主原创文章,转载请注明出处 Spring注入方式 本篇博客只讲最常用的两种注入方式:设值注入和构造器注入.代码为完整代码,复制即可使用. 1.目录结构 2.pom.xml < ...
- hdu1316
链接:pid=1316" target="_blank">点击打开链接 题意:问区间[a,b]中有多少斐波那契数 代码: #include <iostream ...
- 微信小程序设置控件权重
项目中最常用的两种布局方式,水平布局和垂直布局,在微信小程序中实现起来也比较简单. 1.横向水平布局: 实现水平布局,需要四个view容器组件,其中一个是父容器.如下: & ...