ES搜索排序，文档相关度评分介绍—

Field-length norm

How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body field. The field length norm is calculated as follows:

norm(d) = 1 / √numTerms

The field-length norm (norm) is the inverse square root of the number of terms in the field.

While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string field per document in the index, whether or not a document contains the field. Exact-value not_analyzed string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed fields as well:

PUT /my_index

{

  "mappings": {

    "doc": {

      "properties": {

        "text": {

          "type": "string",

          "norms": { "enabled": false } 

        }

      }

    }

  }

}

This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length.

For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.

Putting it together

These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.

When we run a simple term query with explain set to true (see Understanding the Score), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:

PUT /my_index/doc/1

{ "text" : "quick brown fox" }

GET /my_index/doc/_search?explain

{

  "query": {

    "term": {

      "text": "fox"

    }

  }

}

The (abbreviated) explanation from the preceding request is as follows:

weight(text:fox in 0) [PerFieldSimilarity]:  0.15342641

result of:

    fieldWeight in 0                         0.15342641

    product of:

        tf(freq=1.0), with freq of 1:        1.0

        idf(docFreq=1, maxDocs=1):           0.30685282

        fieldNorm(doc=0):                    0.5

	The final `score` for term `fox` in field `text` in the document with internal Lucene doc ID `0`.
	The term `fox` appears once in the `text` field in this document.
	The inverse document frequency of `fox` in the `text` field in all documents in this index.
	The field-length normalization factor for this field.

Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.

ES搜索排序，文档相关度评分介绍——Field-length norm的更多相关文章

ES搜索排序，文档相关度评分介绍——Vector Space Model
Vector Space Model The vector space model provides a way of comparing a multiterm query against a do ...
ES搜索排序，文档相关度评分介绍——TF-IDF—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time.
Theory Behind Relevance Scoring Lucene (and thus Elasticsearch) uses the Boolean model to find match ...
ES 文档与索引介绍
在之前的文章中,介绍了 ES 整体的架构和内容,这篇主要针对 ES 最小的存储单位 - 文档以及由文档组成的索引进行详细介绍. 会涉及到如下的内容: 文档的 CURD 操作. Dynamic Mapp ...
ES-PHP向ES批量添加文档报No alive nodes found in your cluster
ES-PHP向ES批量添加文档报No alive nodes found in your cluster 2016年12月14日 12:31:40 阅读数:2668 参考文章phpcurl 请求Chu ...
atitit.vod search doc.doc 点播系统搜索功能设计文档
atitit.vod search doc.doc 点播系统搜索功能设计文档按键的enter事件1 Left rig事件1 Up down事件2 key_events.key_search = fu ...
【ElasticSearch】：索引Index、文档Document、字段Field
因为从ElasticSearch6.X开始,官方准备废弃Type了.对应数据库,对ElasticSearch的理解如下: ElasticSearch 索引Index 文档Document 字段Fiel ...
es之对文档进行更新操作
5.7.1:更新整个文档 ES中并不存在所谓的更新操作,而是用新文档替换旧文档: 在内部,Elasticsearch已经标记旧文档为删除并添加了一个完整的新文档并建立索引.旧版本文档不会立即消失 ,但 ...
es搜索排序不正确
沿用该文章里的数据https://www.cnblogs.com/MRLL/p/12691763.html 查询时发现,一模一样的name,但是相关度不一样 GET /z_test/doc/_sear ...
MongoDB中的映射，限制记录和记录拼排序文档的插入查询更新删除操作
映射在 MongoDB 中,映射(Projection)指的是只选择文档中的必要数据,而非全部数据.如果文档有 5 个字段,而你只需要显示 3 个,则只需选择 3 个字段即可. find() 方法 ...

随机推荐

win10 只要打开文件对话框就卡死解决方法
我电脑的问题是:win10系统,只要打开文件对话框就卡死,假死,cpu100% 一直没有解决,但是只要把缩略图关了,就ok. 但是又想要留着缩略图,还是得显示,于是乎一直在找解决办法. 此方法好像可 ...
Hessian原理与程序设计
Hessian是比較经常使用的binary-rpc.性能较高,适合互联网应用.主要使用在普通的webservice 方法调用.交互数据较小的场景中.hessian的数据交互基于http协议,通常he ...
ie下div模拟的表格，表头表体无法对齐
现象:ie下,如果某个区域滚动显示表格内容(div模拟的table),表体出现滚动条的时候,会跟表头无法对齐. 解决方法:1.首先需要知道两个高度:表体最大高度height1.目前表体要显示的内容高度 ...
Neural Turing Machines-NTM系列（一）简述
Neural Turing Machines-NTM系列(一)简述 NTM是一种使用Neural Network为基础来实现传统图灵机的理论计算模型.利用该模型.能够通过训练的方式让系统"学 ...
Oracle 优化器
http://blog.csdn.net/it_man/article/details/8185370一.优化器基本知识 Oracle在执行一个SQL之前,首先要分析一下语句的执行计划,然后再按执 ...
swiper-demo1
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
SQL Server常用系统表
1.查询当前数据库中的用户表 select *from sysobjects where xtype='U'; 2.获取SQL Server允许同时用户连接的最大数 SELECT @@MAX_CONN ...
【转载】ASP.NET之旅--深入浅出解读IIS架构
在学习Asp.net时,发现大多数作者都是站在一个比较高的层次上讲解Asp.Net. 他们耐心. 细致地告诉你如何一步步拖放控件. 设置控件属性.编写 CodeBehind代码,以实现某个特定的功能. ...
怎样给filter加入自己定义接口
.在Cfilter类的定义中实现Interface接口的函数的定义: //-----------------------Interface methods----------------------- ...
Android 自己定义ImageView实现圆角/圆形附加OnTouchListener具体凝视以及Button圆角
转载请注明出处:王亟亟的大牛之路平时要用一些非方方正正的button之类的小伙伴们是怎样实现的?RadioButton? ImageButton? 还是其它? 今天亟亟上的是ImageView来实现 ...

ES搜索排序，文档相关度评分介绍——Field-length norm

Field-length norm

Putting it together

ES搜索排序，文档相关度评分介绍——Field-length norm的更多相关文章

随机推荐

热门专题